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ABSTRACT 

Six methods of equating Test of English as a Foreign 
Language (TOEFL) test scores for samples consisting jf the usual 
groups of examinees and groups controlled for native language 
representation were evaluated in terms of - scale stability. The 
equating methods included three item response theory (IRT) variants 
(fixed b's scaling, a one-parameter model in' which a- and 
c-parameters were fixed at constant values, and a model in which all 
three parameters were re-estimi»ted) , and three conventional equating 
methods (Tucker, Levine and Equipercentile). The equating methods 
were applied to Section II, Structure and Written Expression, and 
Section III, Reading Comprehension and Vocabulary. For the regular 
group of examinees, fixed b's IRT equating exhibited the greatest 
sc^le stability for both sections with the one-parameter IRT model 
and Tucker linear equating following in that order. Fol* most equating 
methods, controlling for native language resulted in increased scale 
stability relative to the regular group for Section II, but produced 
more error in Section III. This interaction may be related to the 
differential performance observed among language groups on Section 
III in previous studies. Results supported continued use of fixed b's 
scaling for TOEFL data using a random sample of examinees from the 
total testing group. (Author) 
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Abstract 

•> 

Six methods of equating TOEFL test scores for samples consisting of the 
usual groups of examinees tested at each TOEFL administration, and groups of 
examinees controlled for native language representation were evaluated in 
terms of scale stability. The equating methods included three IRT variants 
(fixed b's scaling, a one-parameter model in which a- and c-parameters were 
fixed at constant values, and a model in which all three parameters were 
re-estimated), and three conventional equating methods (Tucker, Levine and 
equipercentlle). The equating methods were applied to Section II, Structure 
and Written Expression, and Section III, Reading Comprehension and Vocabulary. 
For the regular group of examinees, fixed b's IRT equating exhibited the 
greatest scale stability for both sections with the one-parameter IRT model 
and Tucker linear equating following in that order. For most equating 
methods, controlling for native language resulted in increased scale stability 
relative to the regular group for Section II, but produced more error in 
Section III. This interaction between Section III and the controlled group 
may be related to the differential performance observed among language groups 
on Section III in previous studies. The results of this study supported 
continued use of fixed b*s scaling for TCEFL data using a random sample of 
examinees from the total testing group. 
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A Comparative Study of Methods of 
Equating TOEFL Test Scores 

Introduction 

The Test of English as a Foreign Language (TOEFL), which assesses the 
English proficiency of foreign students desiripg to study at colleges and 
universities in the United States and Canada, is comprised of three sections 
and sevbn parts as follows: 

I. Listening Comprehension 

A. Statements (20 items) 

B. Dialogues (15 items) 

C. Minitalks (15 items) 

II* Stucture and Written Expression 

A* Structure (15 items) 

B. Written Expression (25 items) 
III* Reading Comprehension and Vocabulary 

A. Vocabulary (30 items) 
^ B. Reading Comprehension (30 items) 

Sections II and III can include 20 and 30 pretest items respectively, inter- . 
spersed among the operational items* An equated score is reported for each 
section In addition to a total score. In September 1978, TOEFL adopted item 
response theory (IRT) methodology in the form of the three-parameter logistic 
model for the purpose of equating test scores in lieu of conventional linear 
methods. This implementation of IRT was preceded by a feasibility study which 
compared equatlngs based on IRT parameter estimates with those determined from 
conventional methods for reasonable concurrence (Cowell, 1982). An informal 
study of scale stability, limited to Sections II and III, was undertaken In 
November 1979 in which equatlngs based on original parameter estimates were 
compared with those derived from a chain of calibrations. Small differences 
were observed which generated questions that could not be answered by the 
limited scale of the November 1979 study. In general, It was not possible to 
determine if the results observed were due to (1) some artifact of the ccallng 
procedures, (2) variability among testing groups, (3) changes in test 
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specifications over the time spanned by the study ^ or some combination of 
these factors* Underlying theoe concerns was the fundamental question of the 
appropriateness of the IRT model to TOEFL data. Each TOEFL test 
administration is comprised of over 100 different language groups of varying 
degrees of affinity to English^ the effects of which have been assessed in , 
studies of differential item performance (Angoff & Sharon^ 1974; Alderman & 
Holland^ 1980) and ''factor structure (Swlnton & Powers^ 1980). Differences 

observed among language groups in these studies suggested that the basic IRT 
assumption of the unidimensionality of the latent (ability) space might be 

violated by TOEFL data. For example » the Swinton and Powers factor analysis 
of Form YTF4> administered in 1976, indicated that Vocabulary and Reading 
Comprehension did not constitute a single dimension for non-Indo-European 
examinees as it did for Indo-European language groups. For the former, 
performance on Reading Comprehension in Section III was more closely allied 
to Structure and Written Expression, with Vocabulary defining a separate 
factor. Furthermore, the factors underlying Section III were less highly 
differentiated for these examinees than for other language groups. In 
practice, it appeared that while item parameters might vary among language 
groups, such parameters could be estimated which characterized the testing 
group as a whole. 

^ Differential language group performance also has Implications for the 

suitability of some conventional methods of equating TOEFL test scores since 
most of these procedures are explicitly or implicitly dependent on random 
sampling from a common population. This study was undertaken to determine the 
optimal method of equating TOEFL test scores In light of the foregoing 
considerations. 
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Background Information 

IRT Scaling and Equating of TOEFL Test Scores 

The three parameter logistic model for Item 1^ 

= P^(e) c^ + (1 - c^){l + expl-K7a^(e - b^)!)"^ <1) ^ 
^ specifies the conditional probability of a correct response and requires the 
estimation of three Item parameters^ a^ b and c^ and an ability estimate* A 
measure of the discriminating power of the Item^ the arparameter Is related to 
the slope of the Item curve at the point of inflection. The b^parameter Is 
that value on the ability scale midway between the upper and lower asymptotes 
of the logistic Item curve* As a location parameter » It Is an Index of the 
Item difficulty. The parameter Is the value of the ordinate at the lower 
asymptote of the Item curve and represents a measure of the tendency to guess 
on the Item. 

Using LOGIST (Wood^ Wlngersky & Lord, 1976) , TOEFL parameters are 
estimated such that thetas are scaled to mean zero and standard deviation one^ 
with b*s on the theta scale. If another group of examinees were administered 
the same item and a similar scaling were applied, any differences in level and 
spread of ability between the two groups, would result in dissimilar values cf 
the b*s. The Invarlance of item parameters across groups and theta estimates 
across tests will hold only if parameter estimates derived from subsequent 
groups are placed on some established scale. If a set of items have been 
scaled on a given group of examinees » estlmates^.based on successive groups can 
be linearly transformed to the established scale. When old and new forms are 
linked by a block of common items » the slope and Intercept parameters of the 
line relating the b*s can be used to scale all the items in the new form 
(Marcoy 1977). Stocking and Lord (1982) have developed a linear 
transformation which results from the minimization of the average squared 
difference between true score estimates and have reported favorable results 
for this method. 

Current TOEFL scaling procedures do not depend on a block of items common 
to two forms; instead calibrated (scaled) pretested items, selected from many 
previous test forms » serve as the equating items in each version of the test. 
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During parameter estimation, the a- and c-parameters for the calibrated items 
are re-estimated, but the b-parameters are held fixed at the values derived in 
the initial calibrations. Alluded to as "fixed b's" scaling, the presence of 
the precallbrated items sets the scale for the noncalibrated items» In common 
item equating, the equating items are selected to be representative of the 
total test in content and other specifications; however, the precallbrated 
items in the fixed b*s scaling are chosen to span the range of difficulty and 
discriminating power of the total test. Implicit in this procedure is the 
basic IRT assumption that the estimates of difficulties will hold for all 
testing groups except for scale factors. Plots of item curves, on which were 
superimposed squares representing the observed proportions of examinees at a 
given ability level responding correctly to the items (item ability 
regressions) indicated that, on occasion, some of the precalibrated items 
(items for which the b's were held fixed) did not adequately reflect the 
response patterns of the current examinee group. The fit of the newly 
calibrated items was usually quite satisfactory. (Examples of item ability 
regressions are given in Figures 13-16). 

Once the item parameters are on scale. It is only necessary to calculate 
the sum of the item curves, the test characteristic function, which specifies 
true scores as a function of ability. Scores on two tests are then considered 
equivalent if they depend on the same value of theta, Additional information 
regarding true score equating (Lord, 1980; pp. L99-205) as applied to TOEFL is 
outlined in Appendix A. 

Some Conventional Equating Methods 

Conventional methods of equating include linear and equipercentlle 

equating which are defined as follows: 

e quipercentlle equating ; For a given group of 
examinees, two scores on separate forms of a test 
are considered equivalent if their percentile ranks 
are equal. 

linear equating; For a given group of examinees, two 
scores on separate forms of a test are considered 
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equivalent if they correspond to equal standard score 

deviates (Angoff, 1982). 
In the usual testing situation where separate groups take the two tcsst forms," 
the strategy utilized In Implementing these definitions Involves the formation 
of a synthetic equating population, T, as a weighted composite of the two 
testing groups, P, the group taking new form X, and Q, the group taking old 
form \% 

T - WjP + wgQ. (2) 
where w^ and ^2 are weights assigned to the two groups. 

In the case of common item equating, information derived from a set of 
items, V, common to old form Y and new form X aids in determining the 
distributions and first two moments of the synthetic equating group. The item 
set, V, is commonly called the anchor or equating test. Some details of these 

procedures are given in Appendix Bj however, the development of the 

distributions for population T requires the following assumptions: 

a) For equlper cent lie equating, the conditional 
distribution of Y given V « v is the same in groups P 
and Q with a similar assumption for form X, 

Fp(Y|v) - Fq(Y|v) (3) 

Gp(X|v) - Gq(X|v) (4) 

b) For Tucker linear equating, the regressions 
of Y on V, and X on V is the same for P and Q, 

Ep(Y|v) - Eq(Y|v) = av + b (5) 
Ep(X|v) - Eq(X|v) - cv + d (6) 
and the conditional variances are equal in P and Q, 

Varp(Y|v) - Var^CYlv) = (7) 
Varp(X|v) - VarqCXlv) » (8) 
Thus, Tucker linear equating, as well as equlpercentlle equating is based on 
untestable assumptions since data for the old form in Q or for the new form in 
P is not available. Braun and Holland (1982) have noted that assumptions (5) 
through (8) may not be satisfied if the regression system depends on a 
measurable extraneous variable such as some student background characteristic. 
Data collected for the period September 1980 through June 1981 reflecting the 
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language group composition for ten administrations of TOEFL indicated, among 

other things, that language group representation varies across administra- 
tion8# To the extent that language group membership is related to test 
performance on TOEFL, these variations may cause the assumption of equality of 
regressions in Tucker linear equating to be violated* Indeed, Levine (1955) 
has demonstrated by experiment that the invariance of regression parameters 
will not hold on parallel tests if the selection of samples is on variables 
external to the equating design, observing that if the assumptions which are 
made in deriving the mathematical model for equating are not satisfied, it is 
probable that its application will result in biased equivalent scores# Levine 
derived equations to equate tests which have been administered to samples that 
differ in dispersion and level of ability due to selection on variables 
extraneous to the equating experiment. His assumptions were presented in 
terms of the invariance of the true score regression system with the 
additional constraint that V be parallel to X and Y (Levine, 1955; Angoff,l961). 

Objectives of the Study 
The major objective of this study was the determination of the method of 
equating TOEFL test scores that will best maintain the stability of the score 
scale over time, given the variable nature of its testing population. On the 
assumption that the period of time defined by this study would include test 

forms for which the test specifications were relatively constant, the 
following research questions were investigated: 

1. What are the effects of population variability? 

Can they be eliminated by defining an equating group 
controlled for native language representation? 

2. Will alternate methods of scaling IRT parameters produce more 
stable results than those presently employed? Will a simplified 
IRT equating model produce better results with TOEFL 

test scores? 

3« How do conventional linear and curvilinear methods 
compare with IRT equating for the TOEFL population? 
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Methods 

Selecting the Controlled Group 

The initial phase of this study sought to determine "a" set of criteria by 
which a controlled equating group could be formed in order to compare a 
variety of equating models with groups as they naturally arise in the TOEFL 
testing program. In particular, it was required to define a group whose 
proportionate native language representation could be replicated at each of 
the experimental administrations. It was hoped that subsets of language 
groups with similar perfonuflnce profiles (i.e., ^similar rank-ordering of item 
difficulties) could be identified in the expectation of simplifying the 
sampling process. Preliminary analysis indicated that this approach would not 
be successful by virtue of variations in item difficulties even among language 
groups believed to be closely allied. Somewhat similar results were observed 
in a study of TOEFL item bias by Alderman and Holland (1981), consequently 
this approach was not pursued further. It also became clear that if it were 
possible to identify clusters of language groups with similar performance 
profiles, the composition of these groups would differ for the two 
sections. 

Data was collected on native language groups representing at least one 
percent of each administration for examinees taking TOEFL at domestic centers 
for tha year previous to this study. These data indicated that large 
differences existed in monthly representation for Chinese, Arabic, Farsi, 
Spanish and Japanese speaKers. To the extent that item parameter estimates 
differed among language groups, those estimates might be unduly Influenced by 
over-representation of one or more native languages at any given ' 
administration. Likewise, these variations may also violate assumptions basic 
to some methods of linear equating. The minimum proportions observed in ihe 
year preceding the study (to assure availability) for each language group were 
tallied and a group controlled for native language representation was selected 
at each administration as given in Table 1. 
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Table 1. 



Native Language Representation for Controlled Equating Group 



7 



Language Group 



Proportion of Total 
Controlled Group 



Arabic 

Bengalese 

Chinese 

Farsl 

French 

German 

Greek 

Hindi 




.207 
.006 
.150 
.079 
.018 
.008 
.020 
.008 
.003 
.020 
.110 
.060 
X .020 
.010 
.016 
.206 
.007 
.017 
.010 
.009 
.016 
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Equating Design 

Each of the seven experimental administrations of this study was comprised 
of several subtests which Included both operational and pretest Items. 
Section I contained no pretest Items; thus, It was not Included In this study. 
The pretest slots of one subtest at each administration contained operational 
Items from the previous form, thus defining an equating chain as shown In 
Table 2. For some of the equating models this resulted In anchor tests which 
were Internal to the old form and external to the new form. For Sections II 
andlll six types of equating were Included In this study as follows: 

1. Modified IRT: a- and c-parameters were held fixed at 
values determined to be representative, of current TOEFL data; 
only the b' s were estimated. For Section II, a was fixed at 
1.00 and c at .19. For Section III, the fixed value of a was 
1.03 and. c was .20. Parameterst were scaled using the Stocking 
and Lord characteristic curve trans^formation (Stocking & Lord, 
1982) based on a set of Items 'common to two forms. 

2. Fixed b's IRT: This replicated the current TOEFL oper- 
ational scaling procedures as previously described; b's for 
the equating or precallbrated Items were held fixed at 
pretested values, only a- and c-parameters were re-estlmated 
on this Item set, all three parameters were estimated for the 
remaining noncallbrated Items. The equating Items were 
selected from many previous forms. 

2. Three parameters re-estlmated: A set of Items common to 
an old and new form facilitated the scaling of all the Items. 
All three parameters were estimated on the new form. As In the 
case of Modified IRT, using a set of common items, a line 
relating the parameters of the old (scaled) form and the new 
form was calculated based on the Stocking and Lord 
characteristic curve transformation. The parameters of this 
line was used to place all other Items on scale. 
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Table 2. 



Equating Item Links for the TOEFL Experimental Forms 



Form Operational Pretest Slots 



1 



3BTF1 1 


a, a* 


>• • 


3DTF9 


b 


a 


3DTF10 


c 


b 


3ETF1 


d 


c 


3ETF2 


e 


d 


3ETF3 


f 


e 


3ETF4 


g 


f 


3ETF6(37) 




g 


3ETF6(38) 


I 


a* 
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A. Tucker linear equating: Tucker parameters were used 
throughout the chain of equatlngs* 

5. Levlne linear equating: Levlne parameters were used 
throughout the equating chain. 

6. Equlpercentlle equating. 

TOEFL form 3BTF11, administered In November 1979 was the only relatively 
recent test edition with linear parameters linked to the TOEFL scale and was 
therefore chosen as the base equating form in this study. For each IRT and 
conventional equating condition, a separate 3BTF11 scale was established. For 
all IRT equatlngs the experimental form was equated to the approprrate version 
of the base form. .The links served only for the purpose of scaling in the 
Modified IRT and 3-parameters re-estimated models while in the three conven- 
tional equating methods each experimental subtest was equated to the previous 
form in the chain. The equating group for the fixed b's method was a spaced 
sample across all subtests of the experimental forms. All. other equating 
groups ware necessarily based on the single subtest which served as the, link. 
The input parameters for the regular and controlled groups in fixed b's 
scaling were operational TOEFL data, i.e., derived from the regular 
TOEFL testing population. IRT equatings were derived from operational TOEFL 
computer programs. 

A sample size of approximately 1,000 is required for reliable estimation 
in the three parameter model. As a result, the 3-pararaeters re-estimated 
could not be run on the controlled group since, frr some administrations, this 
group represented about one-jthird (about 300) of the examinees taking any 

linked subtest form. Consequently, the following conditions were observed in 
this study: 

Regular Gr. Controlled Gr. 

Modified IRT 

3-parameters re-estlmatod 

Fixed b's 
Tucker 

Levine 

Equlpercentlle 



X 
X 

X 
X 

X 

X 



X 
X 

X 

X 
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Analysis of Results 

The design accounted for an empirical evaluation of the stability of the 
various equating conditions by utilizing two subtests, 37 and 38, of the final 
experimental form, 3ETF6. Accordingly, Che following Items were Included In 
these subtests: 

1. A set of Items linked to the previous form In 
'' ^the equating chain (subtest 37), 

2. A set of Items from 3BTFU as a direct link 
to the base form (subtest 38). 

The equatlngs derived from the direct link served as the criterion against 
which each equating chain would be compared using a discrepancy Index - 
developed by N. Petersen (Petersen, Marco and Stewart, 1982), and a computer 
program written by staff In College Board Statistical Analysis. The Index Is 
a weighted mean square difference decomposed Into the variance of the 
difference and the squared bias. Thus, If d^ - (t^*- t^), where for raw score 
1, 1»0, 1, •••,n, t|^* and t^^ are converted scores corresponding to the 
criterion and chain equatlngs respectively, and f is the number of examinees 
at each score level, then 

Z f^ d^^/n » ^ ^1 ^^i + "? (10) 

Total Error » Variance of Difference + Squareid Bias. 
Squared 

Optimum conditions for the criterion comparisons Include equivalent samples, 
and anchor tests of equal difficulty for the two subtests. All equating 
comparisons were based on independent samples taking the two 3ETF6 
experimental subtest forms. The one exception was fixed b*s equating in the 
controlled group in which case the single subtest samples were not of 
sufficient size to estimate parameters. Consequently a methodology was 
adopted which simultaneously estimates a large number of items taken by more 
than one group of examinees (Lord, 1980, pp. 205-206). Comparisons involving 
equipercentile equatlngs were limited to the range of scores actually observed. 

20 
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Results 

Description of Samples Used In the Study 

Raw score data « Raw score data for samples In this study are given In 
Table 3 (R and C refer to regular and controlled groupSi respectively). 
Numbers In this table depend on selection criteria In the computer programs on 

which these data were based (L06IST eliminates examinees with perfect scores, 
the Item analysis program produces samples based on a factor of five)* Sample 
sizes reflect an effort to maximize the reliability of the Item parameter 
estimates consistent with the sampling proportions for the controlled group* 
With the exception of the criterion comparisons for the regular group, a 
spaced sample was taken across all subtests for fixed b*s scaling; all other 
methods depended on a single subtest* In the 3BTF11 samples, the controlled 
groups slightly outperformed the regular group and were somewhat more 

variable* For the experimental forms., the controlled groups were less able 
overalls — 

Analysis of regressions of total test on anchor test * Additional 
Information regarding differences between groups can be elicited from the 
regression of total score on the equating items score* Since the equating 
items enter into the determination of the converted scores in various ways, 
analysis of these regressions may illuminate the nature of the differences 
between some of the equating results for the two groups* These data, 
including probabilities associated with the null hypotheses usually tested in 

an analysis of covarlance, as calculated from the Wllks--Gulllksen Ancova 
program (Gulllksen and Wllks, 1950), are presented In Table A* P(A), P(B) and 

P(C) represent the degree of confidence In acceptliig the following hypotheses: 

P(A) « Pu [equal errors of estimate] 
"o 

P(B) » [equal slopes] 
"o 

P(C) « P„ [equal intercepts]* 
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Table 3* 

Raw Score Means, Standard Deviations and Sample Sizes 
for All Equating Groups 











II 




III 


Form/Group 




N 


Mean 


S.D. 


Mean 


S.D. 








Base Form 


- IRT Equating 






3BTF11 


R 


14068 


23.27 


7.01 


31.13 


10.13 




C 


7501 


23.80 


6.80 


32.52 


10.04 


Base Form - Linear Equating 


MTPI 1 

joir 1 1 




1580 


23.88 


(•00 


32.03 


10.20 




c 


2115 


21.09 


6.93 


32.66 


9.99 






Experimental 


Forms - Fixed b's 






3DTF9 


R 


2283 


25.30 


6.86 


35.77 


10.05 




C 


1785 


25.23 


6.86 


35.12 


9.83 


3DTF10 


R 


1159 


25.21 


6.68 


31.71 


10.60 




C 


1115 


21.69 


6.77 


32.92 


10.97 


3ETF1 


R 


2271 


25.15 


7.00 


36.09 


10.15 




C 


1819 


' 21.19 


7.28 


35.12 


10.27 


3ETF2 


R 


1771 


26.27 


6.81 


37.10 


10.01 




C 


1615 


25.72 


6.81 


36.15 


9.96 


3ETF3 


R 


2126 


25.31 


7.09 


31.38 


9.16 




C 


1871 


21.76 


7.07 


33.83 


9.15 


3ETFM 


R 


2330 


25.38 


6.62 


37.28 


9.65 




C 


1709 


25.02 


6.51 


36.71 


9.87 


3ETF6(37) 


R 


1011 


26.35 


6.29 


38.20 


8.66 




C 


1790 


21.85 


6.90 


35.86 


9.32 


3ETF6(38) 


R 


988 


26.23 


6.97 


37.26 


8.61 




C 


1790 


21.85 


6.90 


35.86 


9.32 
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Table 3 (cont*d) 

Experimental Forns - Tucker, Levine, 
Equlpercentlle, Modified IRT, S-parameters Re-estlmated*,, 









11 


III 




Form/Group 


N 


Mean 


S.D. 


Mean 


S.D. 




3DTF9 


R 


1265 


25.39 


6.96 


35.98 


10.20 






c 


710 


25*20 


6.99 


35.44 


9.95 




3DTF10 


R 


1575 


25.98 


6,11 


35.59 


10.48 






C 


325 


. 24.88 


6.96 


33.09 


11.03 




3ETF1 


R 


1530 


25.96 


6.93 


36.59 


9.88 






C 


460 


24.92 


7.20 


35.24 


10.27 




3ETF2 


R 


1275 


26.62 


6.68 


37.57 


9.90 






C 


605 


. 25.94 


6.83 


36.56 


10.00 




3ETF3 


R 


1710 


26.14 


7.01 


35.13 


9.20 


ft 




C 


845 


24.76 


6.89 


33.46 


8.70 




3ETF4 


R 


1825 


26.24 


6.37 


38.64 


9.21 






C 


845 


25.40 


6.55 


37.51 


9.11 




3ETF6(37) 


R 


1005 


26.44 


6.34 


38.23 


8.66 






C 


315 


25.38 


7.04 


36.39 


9.51 




3ETF6(38) 


R 


980 


25.95 


6.49 


37.72 


8.71 






C 


305 


24.93 


7.04 


35.72 


9.25 





3-parameter re-estimated based on regular group only. 
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Table it 

Analysis of Covarinnoe for Regular and Controlled Groups 
for All Experimental T«8t Forms 











Section 


ZX 










Section III 






ForD /Group 


r 


VEE 


b 


Int. 


P(A)» 


P(B)» 


P(C)« 


r 


VEE 


b 


Int. 


P(A)» 


P(B)» 


P{C)» 


3BTF11 


R 
C 


.95;: 

^ WW 

.95 


5.10 
5.00 


1.61 
1.61 


4.59 
4.65 


.71 


.93 


.61 


.95!! 
.95 


1.05 
1.03 


1.74 
1.72 


3.39 
3.76 


.69 


.34 




3DTF9 


R 
C 


.9'» 
.91 


5.5'J 
5.91 


1.75 
1.75 


3.05 
2. '92 


•33 


.91 


.43 


.65 
.82 


2.69 
3.20 


1.60 
1.57 


6.68 
9.03 


.12 


.53 


.53 


3DTF10 


R 
C 


.93 
.91 


6.03 
6.02 


1.79 
1.80 


2.77 
2.64 


.98 


.85 


.78 


.64 
.85 


3.22 
3.46 


1.58 
1.65 


6.33 
6.23 


.41 


.25 




3ETF1 


R 
C 


.93 
.93 


6.90 
7.10 


1.92 
1.93 


1.51 
1.16 


.69 


.75 


.19 


.87 
.68 


2.38 
2.47 


1.61 
1.58 


7.34 
7.67 


.63 


.55 


.54 


3ETF2 


R 
C 


.9'J 
.9'» 


5.47 
5. '•7 


1.7'» 
1.78 


3.83 
3.27 


.99 


.33 


.16 


.83 
.84 


3.06 
2.97 


1.60 
1.61 


7.37 
6.81 


.62 


.83 


.20 


3ETF3 


R 
C 


.79 
.78 


18.15 
18. 119 


l.iiii 
1.45 


8.13 
7.45 


.98 


.87 


.03 


.78 


2.79 
2.91 


1.41 
1.30 


9.17 
10.95 


.60 


.06 




SETFH 


R 
C 


.78 
.77 


15.78 
17. '»5 


1.12 
l.iil 


7.67 
7.54 


.09 


.91 


.25 


.60 
.81 


3.07 
3.20 


1.62 
1.66 


9.26 
6.28 


.49 


.42 




3ETF6(37) R 
C 


.79 
.81 


15.01 
16.63 


1.36 
1.18 


8.34 
6.91 


.26 


.08 




.60 

.62 ' 


2.71 
2.92 


' 1.40 
1.42 


11.27 
10.43 


.41 


.66 


.31 


3ETF6(36) R 
C 


.76 
.77 


17.83 
20.50 


1.32 
1.3'< 


8.97 
8.38 


.13 


.71 


.36 


.79 
.84 


2.60 
2.51 


1.43 
1.53 


12.31 
10.02 


.25 


.13 





* P(A) 3 Py [varianoe of errors of estiaate] 



P(B) m P^ [alopea] 
0 

P(C) s Py [interoepta] 
"0 

Not indloated if P(B) < .50 

f 

Spuriously high correlation. Internal anohor teat. 
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No values of P(C) are listed If P(B) < .50. The tests are N dependent, thus 
small differences may have greater significance with Increasing sample size. 

Samples used In this analysis were those based on the linked subtests, I.e., 
the samples from the bottom half of Table 3. Focusing on the slopes of the 
regressions, the two groups were less alike on 3ETF2 and 3EFT6 (37) for 
Section II* These were the forms, however, on which the slopes demonstrated 
the greatest concurrence In Section III. On this section, the two groups 

differed most in 3ETF3 and 3ETF6 (38). In general, the data indicated greater 
similarity between groups on Section II than on Section III, a significant 

relationship which impacts on later equating comparisons. 

Equivalency of samples . Spiralling of subtest forms (distribution of the 
subtests in serial ordev) is intended to assure equivalent samples when more 
than one form of the test is administered. A rough evaluation of any effects 
of spiralling on the equating samples for the September (3DTF9) through April 

(3ETF4) administrations of this study can be achieved through comparisons of 

mean ability for the fixed b's scaling and for the Modified IRT as given in 

Figures 1 through 4. differences in location of the dotted and solid lines 
relative to the ordinates reflect the fact that each equating method is on a 

different scale. If mean ability for the fixed b's sample can be considered 
to be the more reliable estimate for these six administrations (since it is 
taken across all subtests), then to the extent that the trends in the two sets 
of data concur, the sample based on the linked subtests ^can be considered to 
be representative of its group. The linked subtests graphs (dotted lines) for 

the regular group appear to be in closer correspondence with the sample taken 
across all forms than those for the controlled group. For both groups and 
, both sections, an effect due to spiralling can be observed at the April 

administration where the single subtest sample produced relatively higher 
means than the fixed b's sample. 

The last two points of these plots represent the chain and direct forms, 
respectively, which were used to determine the stability of the scales. 
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Figure 1. 
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sizeable differences In ability exist on these two forms on Section III for 
both groups, and for the controlled group this difference Is very marked. 

Characteristics of the Fortes Used In the Study 

Table 5 lists data relevant to the nature of the forms Included In the 
study In terms of the average difficulty of the operational and equating 
Items. Comparisons of mean deltas (linearly transformed Item difficulties » 
see Angoff & Dyer, 1971) for the operational and equating Items Indicate that, 
the forms are parallel In terms of mean difficulty with the exception of 
Section III In 3ETF4, a relatively easy form, and Section III of 3BTF11 which 
was the most difficult form In the study. Overall, the equating Items for the 
linked forms were slightly more difficult than the operational test, 
especially In Section III. The characteristics Qf the forms used In the 
equating comparisons closely parallel Variation 8 In the Petersen, Marco and 
Stewart study In that, for some equatlngs, the base form was slightly more 
difficult than the test to be equated, and for Section III of subtest 38, the 
anchor test was more difficult than the operational test; These conditions 
were found to rather consistently produce greatest error in the evaluation of 
linear equating (Petersen, Marco & Stewart, 1982, Table 10). 

The results described above have obvious implications for the reliability 
of the equating comparisons in Section III. A common procedure in evaluating 
the results of an equating experiment has been the use of the identity 
equating (Levlne, 1955; Petersen, Marco and Stewart, 1982). In thi6 case, the 
base form is re-administered as the final link, and lack of scale stability is 
evaluated in t wms of the departure of the slope of the equating line from 
unity. Objections to this method Involve the possible advantage derived from 
equating a test to Itself in the case of the one-parameter IRT model 
(Petersen, Marco and Stewart, 1982). An alternative procedure of using two 
forms, one based on a direct link to the scale and the other the result of the 
chain, was adopted in this dtudy to circumvent this objection. Equivalent 
samples for these forms were assumed to be attainable by virtue of spiralling. 
The second requirement of equivalent difficulty of the anchor tests was 
difficult to achieve due to current limitations on the availability of items 
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faetlott II 



Saetlen III- 







8.D. 


't 


t 

• 


«.P. 






8.D. 


't 


'•^ 


S.D. 


m 

t 


3BTril 


'« 11.A6 
C 11^66 


2.22 
2.21 


.32 
.31 


11.98 
11.M 


2.07 
2.07 


.54 
.54 


.12.60 
12.62 


1.95 
1.97 


.48 

.47- 


12.49 
12.50 


1.94 
1.91 


.47 
*46 


3DV9 


t U.M 
C 11.86 


2.22 
2.23 


.54 
.54 


12.33 
12.23 


\.93 
1.98L 


.53 
.52 


12.20 
12.20 


l.«5 
1.82 


.48 
.47 


12.52 
12.50 


1.81 
1.76 


.49 
.46 


30Q10 


t 11.91 
C U.S8 


2.09 
2.08 


.54 
•53 


12.05 
11.93 


1.89 ^ 
1.87 


s^54 


12.26 
12.25 


1.75 
1.76 


.51 
•51 


12.47 
12^35 


1.50 
1^53 


.48 
.48 


3ITF1 


t 11.83 
C 11.84 


2.33 
2.36 


.54 
.55 


11.58 
11.62 


2.54 
2.42 


.57 ^ 
.55 


\. 12.28 
^12.29 


2.03 
2.00 


.49 

.50 


12.27 
12.32 


2.22 
2.14 


.52 

. .53 


31172 


K 11.73 
C 11.74 


2.23 
2.29 


.52 
.51 


11.89 
11.87 


2.68 
2.75 


.49 
.48 


12.41 
12.37 


1.77 
1.75 


.49 
.48 


12. 3S 
12.29 


1.92 
1.88 


.48 
.48 


3ltr3 


K 11.83 

C 11.81"^ 


2.24 

s2.29 


.55 
.53 


12.19 
12.09 


l.S. 
2.00 


.52 
.49 


12.37 
12.33 


2.17 
2.19 


.47 
.44 


12.20 
12.19 


1.70 
1.66 


.48 
.45 


3194 


1 11.83 
C 11.88 


2.«\ 
2.42 


..52 


11.80 
11.84 


2.70 
2.76 


.53 
.52 


11.99 
11.99 


1.99 
1.99 


.50 
.50 


12.28 
12.28 


2.24 
2.25 


.47 
.47 


31176(37) 1 11.91 
C 11.97 


2.12 
2.12 


.50 
.52 


^81 
12^ 


2.24 
2.28 


.53 
.54 


'12.13 
12.17 


2.13 
2.15 


.46 
.49 


12.23 
12.27 


1.66 
1.76 


.46 

.50 


3BTr6(38) t 11.92 
C 11.96 


2.08 
2.12 


.51 
.52 


12.03 
12,07 


V66 
1.78s^ 


.49 

.51 


12.10 
12.14 


2.13 
2.13 


•47 
.46 


12.53 
12.56 


1.83 
1.84 


.38 
.45 
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at the extreme ranges of difficulty. ^The 3BTF11 form was the only relatlv^y 
recent test edition of TOEFL with linear parameters to the TOEFL scale, and as 

it turned out» a much more difficult/ form than those used In the study. For 
Section ill, where a set of reading /comprehension items are dependent on a 

/ 
/ 

Single /passage, there Is little leverage for manipulation of the level of 
difficulty^ As a result, anchor treats of equlvalent^41ff Iciilty. were not 
attainable for Section III. / 

/The characteristics of the sitmples and the tests used InN^e equating 
comparisons can be summarized a^ follows: 



Section II / 
Base form and operational test 
are of equivalent difficulty 
for all equatings* 



2. In the regular group the 
anchor test on subtest 37 was 
easier than that on 38« For 
the controlled group, the 
anchor tests were of equivalent 
difficulty. 

3. Anchor test roughly equivalent 
In difficulty to operational 
test for both groups. 

4. Equivalent ability (based on 
mean theta) on both subtests 
within groups. 



Section III 
Base form more difficult than tHe^ 
test to be equated for IRT equatlngs., 
For conventional equatlngs, base form 
and test to be equated were of. equal 

difficulty. 

For both groups, the anchor test on 
subtest 38 was more difficult than that 
on subtest 37. 



Anchor test relatively more difficult 
than operational test. Greatest differ 
ences in difficulty observed on subtest 
38. 

Nonequivalent ability for subtest 37 
and 38 within regular and controlled 
groups. 
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Given these conditions, coap^]^6,on8 for Si^ction II can be considered to be a 
valid assessment of the effecjto^controlling for native language 
representation and of t;he stabllityN^ the equating methods. The variations 
observed in tho forms for the Section III comparisons exemplify some of the 
difficulties existing for that section on operational TOEFL. Criterion 
comparisons for this section will be confounded by substantial departures from 
optimal conditions and should be Interpreted accordingly* 

Equating Criterion Comparisons > Regular Group 

Discrepancy Indices,' based on scaled scores, for the regular group are 
presented In tUe top half of Table 6, Least error was observed for fixed b's 
scaling In both sections, with Modified IRT and Tucker equating following In 
the order of magnitude of error. • A positive bias Indicates that the 

criterion, i.e. the direct equating, tended to produce higher scores than the 
chain, and conversely for negative bias. In Section II, the chain results 
underestimated the criterion scores, while In Section III, the criterion was 
overestimated, this latter effect probably due. In part, to the variations In 
difficulty described above. Indeed, the major effect of the variations 
observed In Section III was the dl^ectlgii^^ however, fixed b*s equating 

was the least sensitive to these ferenc^s. - ( 




The magnitude of the proportion of>^uared bias for Modified IRT Is 
observed to be quite large for bot^^ectloh^. Although the error for the 
three-paranieters re-estlmated model ^as large^compared to other IRT methods, 
most of this error was due to the variance of theN(lf f erences. These results 
are Inherent In the models, however. The constant values of the a-parameter 
In the Modified IRT vary from form to form only by division of the slope of 
the linear transformation which. In turn, limits the range of the slopes of 
the test characteristic curves. When compared to the criterion, the major 
difference Is simply a shift In location. As a result, the variability of the 
differences will be a small portion of the total error. On the other hand. 



31 




/ 

I 

i 

i 

*• . ** 



Table 6 

Equating Criterion Compari^ns 





Section II 






Section III 




Method 


Var. 


Bias 


Error 


Var. 




Bias 


Error 


Regular Group 


Modified IRT 


.04 


(+)1.04 


1.08 ' 


.35 


(-)1.74 


2.09 


Fixed b*s 


.52 


(♦) .10 


.62 


.21 


U) .61 


.82 


3.^arameter 


3.8K 


(4.}1.6K 


5.^18 


3.48 


(-)2.36 


5.64 


Tucker 


1.38 


(♦)2.55 


3.93 


1.10 


(■ 


01.34 


2.44 


Levlne 


3.19 


(•i>}i|.02 


7.21 ° 


2.48 


(. 


02.20 


4.68 


Equipercentile 


2.00 


(4.)11.61 


6.61 


.51 


(. 


04.41 


4.92 


Controlled Group 


Modified IRT 


.15 


(♦)l'.04 


1.19 


1.51 


(. 


09.23 


10.74 


Fixed b*s 


.16 


.00 


.16 


.21 


(■ 


0 .51 


.72 


Tucker 


.05 


(♦)1.7i* 


1.79 


2.21 


(. 


02.72 


4.93 


Levine .. 


.in 


(•••)2,72 


2.66 


5.88 


(■ 


04.15 


10.03 


Equipercentile 


.69 


(•^) .85 


1.51* 


.61 


(. 


03.86 


4.47 
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the slopes of the test characteristic functions for the 3-paraineter model can 
vary substantially accounting for less systeipatic difference. These effects 
, can be seen in graphs of the unweighted differences between the criterion and 
Chained results in Figures- 5 through 12. For linear equating the graph of the 
differences is simply a line of negative slope; the greater the absolute value 
of the slope, the greater the bias. ^ 
Standard errors of measurement ranged from 2.92 to 4.03 for Section II and 
from 2.61 to 3.20 for Section III. Other studies have determined that the 
standard error of equating is generally less than the standard error of 
measurement. Equating errors are larger at the tails of the distribution and, 
among equating methods, largest for equipercentile equating (Lord, 1981(a)). 
The mean difference (square root of the squared bias) for all criterion 
comparisons fell within the range of the standard error of measurement. The 
upper and lower limits of the converted score scale are, in part, determined 
by the method of equating. For the IRT equatings, these limits are the scaled 
scores at the upper asymptote of the test characteristic curve of the old form 
and a lower limit of 20 (see Appendix A). In the equipercentile equatings of 
this study, the upper and layer limits of the converted scores correspond to 
the range of observed raw scores. Depending on how the slopes differ in 
linear equating, greatest differences will generally occur at either or both 
extremes of the scale. Lists of conversions are given in Appendix C. The 
frequencies listed there, used to compute the discrepancy indices, are from a 
representative form of TOEFL.. Table C9 in Appendix C presents the scaled 
scores and standard deviations for each of the experimental samples. 
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Figure 5* 
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Figure 7# 
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Equating Comparisons > Controlled Group 

Results of the equating comparisons for the controlled group are given in 
the lower half of Table 6 where it can be seen that fixed b's equating again 
yielded least error for both sections and was less than that observed for the 
regular group. It can also be observed that controlling for native language 
produced greater scale stability In Section II for most equating methods, but 
substantially more error for Modified IRT, Levlne and Tucker equating In 
Section III. 

The results for Section III may be related to the effects noted In Table 
A, where the regressions of total score on the anchor test Indicated greater 
dissimilarity between the regular and controlled groups on Section III than on 
Section II. Furthermore, the difficulty of the anchor test on Subtest 38 may 
have Impacted heavily on the criterion comparisons for the controlled group. 

Controlling for native language may have affected the dimensionality of 
V Section III in some unexpected way. Since the controlled group (21 native 
languages out of 154) is more precisely defined In terms of language group 
composition, the group may have been more sensitive to subtle variations In 
test content In this section, it Is possible that controlling on Section III 

> 

might have a required a different kind of sampling » perhaps elimination of 
certain language groups altogether. It is probable that the complexities of 
the linguistic and factorial relationships of the test^ as they impact on 
native language groups or groupings » militate against any simple method of 
sampling. Until these relationships are better understood^ a random sample of 
the total group appears to be the most reliable method of sampling TOEFL 
examinees. However^ a more fundamental problem^ dealing with the structure of 
Section III In terms of parallelism across forms and the implications for Its 
construct validity^ Is suggested by these results (see Swlnton & Powers^ 1980^ 
pp. 20-21; Alderman & Holland, 1981, p. 18). 

Item Ability Regressions 

Figures 13 and lA are Item ability regressions for six Items from Section 
III of 3ETF6, subtest 38, based on the Modified IRT with sample sizes of 308 
and 988 respectively. Figure 15 displays item curves for the same Items 
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derived from the 3-parameters re-estimated model. The vertical lines of these 
plots denote the standard error of the curve. While it is clear that the 
larger sample size improves the fit of the item ability regressions, the 
effect on the equated scores is small Indeed; total error amounting to only 
•003 for comparison of the Modified IRT based on the two sample sizes. Plots 
of the Modified IRT and 3-parameters re-estlmated are quite comparable with 
the exception of item 24 which was unable to be fit by an average value of the 
a-*parameter. Of the sixty items in this section, six could be Identified as 
requiring a slope parameter other than the average. The discrepancy index for 
the 3-parameter vs. Modified IRT (based on the smaller sample) Is only .09 as 
listed in Table 8. In the absence of the effects due to methods of scaling or 
linking, the practical impact on the equated scores of small variations in the 
a-parameter among a few items seems to be relatively minor. 

As noted earlier, fixed b's equating has been observed to occasionally 
result in poor fit among precalibrated items. Indeed, it was for this reason 
that the 3-parameters re-estimated model was included in this study. An 
example of an item better fit by re-estimating the b-parameter is given in 
Figure 16. This was the most deviant fit of the precalibrated Items in these 
comparisons. 
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Figure 13. Item ability regrettlone for six items, Section III, 
Modified IRT, N - 308. 
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Figure 14. Iten ability regresilont for tlx Iteni, Section III, 
Modified ZRT, M - 988. 




\ Figure 15. Item abUlty regressions for six Items, Section III, 
\ 3-psrsaeters re-«stlBSted, N - 988. 
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Figure 16. Item ability regreitioni for ma ItM acalcd by 
fixed b*0 «nd 3-p«r«acteri re-«BtiaAted« 
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EquatlnRS Based on Separate Language Groups 

One of the experli&ental administrations provided samples of sufficient 
size to independently equate four language groups* Fixed b*s conversions were 
computed for Arabic, Chinese, Japanese and Spanish examinees taking 3ETF3# 
These equatings were compared with that based on the total controlled group. 
Discrepancies as listed in Table 7 were small in general, and the rankings in 
terms of total error were somewhat different in the two sections. 

Methods Comparisons 

Discrepancy indices between methods based on t«ie direct and chained 
results for the ""regular group are given in Tables 8 and 9. Differences 
observed in the two tables are illustrative of a major source of error in 
equating. From Table 8, all else being equal, the various methods produce 
comparatively similar results, while discrepancies listed in Table 9 reflect, 
among other things, the variability due to methods of linking the forms. From 
Table 9, we observe the not too surprising result that Tucker and Levlne 
equatings are the most similar jvhen the effects of linking are taken into 
account. Among the largest differences observed are the discrepancies between 



fixed b*8 and the 3-parameters re-estimated which probably Incorporates some 
of the effects of re-estimating the b-parameters vs. holding them fixed. 
Modified IRT vs. fixed b*8 have smallest error among all the IRT comparisons. 

It can also be observed that the values of total error in Section II tend to 
be higher than those in Section III. This may be due to the fact that a 41 



score scale for Section III is being stretched to one that can theoretically 





I point observed score scale in Section II, as contrasted with a 61 point raw 



range from 20 to 80. 
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/ Table 7 .. 

Ccnparisons of IRT Fixed b*s Equating for Four 
Language Groups with Total Group (Controlled) 



Section II 




Section 11^ 




Var. Dlff. Sq. Bias 


Error^ 


' Var. Dlff. 


Sq. Bias 


Error 


Arabic .02 ' .36 


.38 


.10 


.01 ' 


.11 


Chinese .10 .01 


.11 


.04 


.04 


.08 


Japanese^ .26 . .87 


1.13 


.05 


.14 


.19 


Spanish .29 .00 


.29 


.02 


.41 


.43 
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Table 8 



Total Error and Squared Bias*, Coaparlfona of Equating 
Methods, Direct Itesults, Begular Group 



Methods 















B 


Methods 


1 


3 


P 


T 


L 






Section II 


J 






1 




• 27 


.44 


.26 


.07 


.29 


3 


.12 




.06 


.14 


.14 


.11 


F 


.12 


.00 




• 30 


.44 


.11 


T 


.23 


.02 


.02 




.08 


.28 


L 


.04 


.02 


.14 


.08 




.17 


E 


.03 


.04 


.03 


.11 


.00 




Section III 


1 




.09 


.07 


.52 


.31 


.27 


3 


.01 




.07 


.96 


.58 


.26 


P 


.05 


.01 




.69 


.29 


.19 


T 


.04 


.10 


.18 


mmmm 


.12 


.51 


L 


.02 


.06 


.02 


.00 




.30 


E 


.02 


.00 


.00 


.08 


•05 





*Total error above diagonal, squared bias below 



diagonal. 

**l-(Mod, IRT), 3-(3-parsB), P-(Pixed b*s), 
T-(Tucker), L-(Levine), E-(Equipercentile) 
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Table 9 

Total Error and Squared Blaa*« Conparleoiui of 
Equating Methods, Chain Reeults, Begular Group 



Method! 



. ** 














Method! 


1 


3 


F 


T 


L 


E 


Section ZI 


1 




3.95 


1.70 


2.57 


4.60 


3.36 


3 


.37 




9.68 


1.43 


1^33 


1.47 


F 


1.01 


2.62 




7.94 


11.06 


9.35 


T 


1.12 


.20 


4.26 




.36 


.17 


L 


1.42 


.34 


4.83 


.02 




.35 


E 


1.64 


.44 


5.26 


.04 


.00 








Section III 








1 




1.29 


.49 


1.26 


1.52 


1.43 


3 


.11 




3.31 


1.20 


.86 


3.66 


F 


.09 


.41 




2.39 


3.09 


1.56 


T 


.13 


.48 


.00 




.18 


3.79 


L 


.00 


.10 


• 11 


.15 




3.51 


E 


.81 


.33 


1.43 


1.53 


.73 





*Total error above diagonal, squared bias below 



diagonal . 

**l-(Mod, IRT), 3-(3-paraaeter), F-(Flxed b's), 
T-(Tucker), L-(Levlne), E-(Equlpercentlle) 
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plBCUBSlon and ConcluslonB 

. Scale stability using a controlled group * The premise that language group 

control would Improve the stability of the TOEFL scale for various equating 
methods was generally supported by the outcomes In Section II. The opposite 
result was observed In Section 111, In which case error was Increased for 
Modified IRT and linear equating methods for the group controlled for native 
language. Only for fixed b's IRT and equlpercentlle equatlngs was there a 
small reduction In error when this type of control was excerclsed. While 
marked ability differences on the two subtests may have confounded the 
comparisons for the controlled group In Section III| the possibility remains 
that other factors related to the multlldlmenslonallty of Sectlbn III for some 
language groups contributed to these results. Based on the findings of this 
study, controlling for language group representation is not recommended for 
operational TOEFL at this time. 

Fixed b^s scallnfe . The current method of IRT equating by fixing b's 
produced the greatest scale stability for both groups. It is not surprising 
that this method of scaling would produce such excellent results In terms of 
the criterion of this study since the location parameters for half (or more) 
of the Items In each section are fixed with only the a- and c-parameters 
allowed to vary. Assuming that the b-parameters held for subsequent groups, 
bias In the a-parameters would be a major source of error. Positive 
statistical bias does exist for the a*s and is greatest for highly 
discriminating, difficult Items (Lord, 1982). In fixed b's scaling, an upper 
limit of 1.5 Is placed on the estimated a-jJarameter which may reduce the 
effect of bias for this group of Items. Plots of precallbrated vs. 
re-estlmated a's collected over time have exhibited no obvious evidence of 
bias, differing only In degree of scatter about the line through the origin. 
A detailed analysis of the precallbrated and re-estlmated a*s has also failed 
to detect any evidence of bias. In practical terms, fixed b's equating offers 
flexibility and Item security which cannot be derived from methods of equating 
based on a block of Items common to two forms since compromise of the first 
form can jeopardize an entire future admlnlstratlont 
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Preca,llbrated items that are identified as serloualy aberrant in terms of 

fit might be treated as noncalibrated and all parameters re^-estimated on the 
current group* Such items could be identified prior to item calibrations by 
comparing equated deltas based on pretesting with those derived In a 
preliminary item analysis (equated deltas and b^-parameters have been found to 
correlate very highly, approximately •96). Such a procedure is ^workable so 

long as these items remain a small proportion of the precallbrated items, as 
they are currently. 

Modified IRT > Results for the. Modified IRT were quite satisfactory for 
both sections in the regular group. However, as the results for the 
controlled group indicate, this method may be sensitive to differences In 
ability. Coupled with the poor performance of the one-parameter IRT model 
when two tests of unequal difficulty are being equated (Petersen, Marco & 
Stewart, 1982), Modified IRT is probably not suited to TOEFL data where such 
variations are likely to occur. 

A practical advantage to this method is the smaller sample size required 
for parameter estimation which would have material impact on the difficulties 
involved in maintaining a precallbrated item pool. Associated with this is the 
reduction in computer costs for estimating parameters. Typical costs for 
running LOGIST (IV) were as follows; 

Modified IRT 3-Parameters 
N - 300 N - 1500 

60 items $10.98 $49.31 

90 items 13.02 77.29 

It is clear that application of Modified IRT to TOEFL would require acceptance 
of some inadequately fit items. 

3-parameters re-estimated . The relatively large error associated with 
estimating all three parameters may reflect the "true" effects of the 
variability associated with TOEFL testing groups. The hypothesis that a less 
sensitive IRT model might produce better results with TOEFL data was supported 
by the outcomes for fixed b's as compared to those for the 3-parameters 
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re-estlmated model. Fixed b's scaling, as Implemented by TOEFL, might be 
categorized a8 a less sensitive model by virtue of the cpnstraints imposed on 
the variation of some of the b-parameterg and the limits on the a-parameters^ 
In contrast, a great deal more information about the' current group is 
Introduced into the scaling procesis in estimating all three parameters. 

X- • • ■ . . - 

Conventional equating methods * Of the linear methods, Tucker equating 

produced the best results, outperforming the 3-parameter re-estimated IRT 
model, rt might be concluded that basic assumptions of Levine equating were 

not met by the data as, for example, tjhe requirement of parallelism of the 
anchor and operational tests. As noted earlier, the equating conditions of 
this investigation for Section III were roughly similar to Variation 8 of the 
Petersen, Marco and Stewart (1982) study which produced large total error. 
Under optimal conditions^ better performance for linear equating might be 
expected. The results in Section III provide some information regarding the 
robustness of various equating methods under less than ideal circumstances • 
In terms of these outcomes, Tucker equating fared rather well, even though it 
is known to be less than ideal when ability distributions differ. 

Limitations of the study . The criterion of this study was the stability 
of the scale over several links and no attempt was was made to evaluate item 
fit. Indeed, the Modified IRT version would have been ruled out apriori on the 
basis of this criterion. The implicit assumption was that all IRT methods 
would provide reasonable fit to the data. The conclusions from this study are 
limited by the tenablllty of this assumption. 

Conclusions . The possible dangers and difficulties associated with 
sampling the extremely complex TOEFL testing population for the purpose of 
equating were demonstrated in this study with the resulting recommendation for 
the continuation of random sampling of the total testing group. This 
recommendation emanates from the results for Section III for the controlled 
group which were consistent with findings from earlier studies (Swlnton & 
Powers, 1980; Alderman & Holland, 1981), therefore associated with this Is the 
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need to evaluate the basic structure of Section III In terms of Its 
parallelism across forms and Its construct validity. / 

Fixed b*s. Modified IRT and Tucker linear equating produced satisfactory 
results for Sections II and III of TOEFL In the regular group of examinees. 
Owing to Its apparent sensitivity to ability differences. Modified IRT should 
probably not be considered for application to TOEFL. Consequently, fixed b*s 
and Tucker llnjsar equating appear to be the best candidates for equating TOEFL 
test scores. Each has practical and/or theoretical advantages and 
disadvantages which can be weighed In t^rms of program resources and the best 
Interests of the examinee population. /The question of Item security Is of 
/paramount Importance to the TOEFL program which administers the tests 
worldwide 9 precluding tight control of the security of pretest Items. Indeed, 
problems associated with compromlsi^d items In the Far East was the primary ^ 
reason for adopting IRT In terms of fixed b*6 scaling. The compromise of a 
single test form overseas could Invalidate the entire form to which ;lt Is 
linked. Fixed b*8 equating, depending as It does on equating /t ems from many 
forms avoids this major difficulty. It Is clear that tradeoffs between Ideal 
statistical conditions and practical realities cannot be enfolded, for there Is 
probably far greater error In compumlsed equating Items than In the 
occalonally observed poor fit of a few precallbrated ltei!|4. 

In applying IRT to TOEFL data, response patterns of a complex population 
are being fit by a complex model providing ample opportunities for evidence of 
the error Incurred In analyzing behavioral data via mathematical models. 

Aside from concerns In terms of meeting the basic assumptions of the model, 
are questions related to the statistical properties of model parameters (e.g., 
bias, see Lord, 1981(b); 1982). Associated with this are the effects of some 
of the artificial constraints on parameter estimation such as the value chosen 
for the upper limit of the a-parameter, to whti:h can be added unpredlctables 
such as variations In Instructional patterns ^hlch may be a source of some of 
the differences observed In Item fit of pr^ecallbrated Items, errors due to 
test administration, and finally, the soQlal and political factors which can 
affect the nature of the population. While Tucker linear equating Is subject 
to many of the same sources of error as IRT equating, it depends on far fewer 
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parameters, and as the data In Table 8 demonstrate, Is appreciably better. In 
terms of scale stability, than the 3-parameter IRT model freely applied to 
TOEFL data, that Is, estimating all three parameters for all Items* These 
results Imply that TOEFL data probably require some constrained method of IRT 
equating In order to control for the many sources of variability. 

Although Ideal equating conditions could not be established In this study 
for the purpose of evaluating the various equating methods, this lack of 
optlmallty may have provided a more accurate reflection of practical 
Implication)^. This study has demonstrated that a randomly sampled population 

consisting of 15A or more language groups Is viable for a restricted form of 
IRT methodology. Ostensibly, the assumptions underlying Tucker linear 
equating are not being seriously violated by TOEFL data. In fact Tucker 
equating demonstrated a measure of robustness In face of less than Ideal 
circumstances. However global this population. It apparently possesses lawful 
regularities In Its own right ai^ienable to certain statistical operations. The 
criteria for this conclusion are the empirical results of an equating 
experiment. While ^t would have been desirable to establish the suitability 
of a given equating method to TOEFL via more analytic methods, consistency of 
results In practical applications Is often the only source of methodological 
validation. 
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APPENDIX A 



Equating TOEFL Test Scores 



Once Item parameters have been estimated, test scores x, on a new form of 
TOEFL are equated to test score, X, on a base form using true scoi^e equating 
(Lord, 1980; pp. 199-205). The equated scores are then placed on the reported 
score scale through a known linear transformation. If Y Is a scaled score, A 
and B are known constants that linearly transform X, a number right score on 
the base form, then 



define a transformation Y(x) which equates observed score x, through the 
elimination of x and e from the given equations. In practice, this is 
accomplished by substituting observed number right score on the new form for x 
In 3(c), then using the known Item parameter estimates, solving for e. 
Inserting this value of e In 3(b) and using known Item parameters for this 
form, an equated number right score, X, results. Scaled scores follow from 
3(a). Scaled scores are rounded to the nearest Integer with those above 80 and 
below 20 set to 80 and 20, respectively. The total test scaled .score Is 
obtained by summing the section scaled scores and multiplying the result by 
10/3. The true score distribution Is bounded below by zc^^ thus, the 
conversions obtained from the equating method above apply only to scores above 
X £c^. For observed scores below this level, where there are relatively few 
observations, a line relating the c's on the old and new form Is calculated. 



X « z P^Ce) 
X » Z Pj(8) 



Y - AX + B 



3(a) 
3(b) 
3(c) 
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Appendix fi 

Some Concepts Underlying Equlpercentlle and Tucker Linear Equating 

In order to implement the definitions of equlpercentlle and linear 
equating given on page 4, a synthetic population T is formed as a weighted j 

composite of Group P taking old form Y and group ,Q taking neW form Y. In the, 

I 

case of common item equating, information derived from a set of items, V, | 
common to old form Y and new form X, aids in the determination of the | 
• xllstrlbutlons and first two moments of the synthetic equating group. Adopting 
the development of B'aun and Holland (1982), the data necessary to produce 
this information is given in Table B. 

Table B 



Distributions Required for Tucker and Equlpercentlle Common Item EqjLiating 





Old Form Y 


New Form X 


CcMmnon Items 


Group P 


Fp(Y|v) 


Gp(X|v)« 


Kp(v) 




Ep(Y|v) 


Ep(X|.v)« 




Group Q 


Fq(Y|v)« 


Gq{X|v) 


Kq(v) 




EQ(Y|v)i 


E^j(X|v) 





* Not observable 



In this table, for example, Fp is the conditional distribution of Y given 
V s V in population P, Kp(v) is the distribution of V in P and E^, is the 
regression of Y on V in P. Ihe purpose, then, is to derive unconditional 
distributions of the old and new forms for the synthetic population T, F^(y) 
and G^(x), given the informatjlon listed in Table B. F^(y) can be written 

Fj(y) = /Fp(Y|v)dKp w^ ♦ /FQ(Y|v)dKQ(v) w^. 
However, F^ is not observable, but if it is assumed that Fp s Fg, then, F^ is 

F^(y) = /Fp(Y|v)dICj.(v) 
where K_(v) = w,Kp(v) ♦ uJ^Av), Similarly, the distribution function of X is 
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G^Cx) « /GQ(X|v>dK^(v). 
Ihe equi percentile equating function according to its def initio?) is then, 

For the case of Tucker linear equating, the function is . 

Assuming Ep s from Table B for X and Y, then * 

My « /Ep(Y|v)dKY(v) 
\ /EQ(X|v)dIC^(v). 

Formulas for the variances of the synthetic population are similarly derived 
based on analagous assunptions (i.e., Varp(y|v) s Var^CYIv), etc.). On the 
assumptions of linearity of regressions and honoscedasticity of errors, one 
result of the foregoing is 

Ep(Y|v) = Eq(Y|v) » av ♦ b 

Varp(Y|v) » Var^CYIv) = o^, 
with analogous formulas for form X. Ihus, lUcker linear equating as well as 
common item equi percentile equating is based on untestable asstmptions^ince 
data for the old form in Q as well as for the new form in P is not available. 
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APPENDIX C* 



Final Converted Scores 

Table CI 
IRT Conversions I Regular Group 
See t ion II 



RAtf 


FRO 

ft 


MIRT37 


MIRT3& 


3P37 


3P38 


FB37 


FB38 

• v^W 


•■■ 


•■■ 










■■■■ 


■■■§ 


40 


58 


65.83 


65.8^ 


65.83 


65.83 

W^ • W J 


.65.83 

'W^ • W J 


65.83 


39 


109 


64.56 


64.90 


65.46 

w ^ • • w 


64.75 


64.60 

e WW 


64e86 






63.27 

w J eK 1 


63.84 


64.84 


63. 5Q 


63.37 

w^ e^ 1 


63.83 


vr 


167 


62.01 


62.72 


64.02 


62.47 

Wfc • ~ 1 


62.16 

Wb • 1 w 


62.76 


36 


235 


60.75 


61 .56 


63.02 


61 .38 

w • e^w 


60.96 

WW e ^w 


61.64 


35 


276 


, 59.5 


60.37 


61.86 

w 1 e WW 


60.28 

w w e w 


69.74 


60.46 


34 


326 


58.25 


59.17 

e • 1 


60.58 

WW 9 ^W 


59.13 


58.51 

^W V ^ 1 


59.22 


33 


362 


57.00 


57.94 


59.18 


57.91 


57.26 


57.90 


32 


454 


55.74 


56.70 

^w e V V 


57.64 


56.62 


56.00 


56.53 


31 


438 


54.47 


55.45 


55.98 

• ^w 


55.27 


54.74 


55.11 


30 


453 


53.18 


54.19 


54.21 


53.88 


53.48 


53.68 


29 


504 


51.89 


52.92 


52.37 


52.46 


52.24 


52.24 


28 


513 


50.59 


51.65 


50.50 


51.04 


51.03 


50.82 


27 


551 


49.28 


50.36 


48.65 


49.64 


49.85 


49.44 


26 


512 


47.97 


49.07 


46.85 


48.26 


48.70 


48.11 


25 


508 


46.65 


47.78 


45.12 


46.96 


47.57 


46.82 


24 


495 


45.32 


46.48 


43.45 


45.65 


46.47 


45.58 


23 


468 


44.00 


45.18 


41.84 


44.41 


45.38 


44.38 


22 


422 


42.68 


43.88 


40.29 


43.21 


44.30 


43.21 


21 


424 


41.37 


42.57 


38.78 


42.03 


43.23 


42.06 


20 


344 


40.07 


41.26 


37.33 


40.87' 


42.16 


40.94 


19 


302 


38.77 


39.96 


35.93 


39.72^ 


41.06 


39.82 


18 


290 


37.49 


38.65 


34.57 


38.58 


39.95 


38.71 


17 


234 


36.22 


37.34 


33.27 


37.43 


38.80 


37.60 


16 


211 


34.96 


36.03 


32.02 


36.28 


37.60 


36.50 


15 


158 


33.72 


34.72 


30.82 


35.12 


36.34 


35.39 


14 


122 


32.51 


33.42 


29.68 


33.95 


35.00 


34.28 


13 


115 


31.33 


32.12 


28.62 


32.75 


33.58 


33.18 


12 


75 


30.18 


30.85 


27.65 


31.52 


32.07 


32.09 


11 


67 


29.09 


29.62 


26.79 


30.26 


30.51 


31.01 



*M.XRTsModifie(l IRT; 3P=3 parameters re-estimated; FBsFixed 
b*8; EsEquipercentile; L=Levine; 37 and 38 refer to subtests 
37 and 38. 
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Table CI cont'd 



RAW 


FRQ 


M1RT37 


MIRT38° 


3P37 


3P38 


FB37 


FB38 


••• 


••• 


•••••• 


•••••• 


•••• 


•••• 


•••• 


•••t 


10- 


52 


28.08 


26.44 


26.09 


28.98 


28.97 


29.96 


9 


35 


27.15 


27.34 


25.60 


27.68 


27.59 


28.95 


8 


19 


26.38 


26.36 


25.36 


26.37 


26.46 


27.98 


7 


15 


25.15 


25.16 


24.26 


25.06 


25.22 


27.06 


6 


8 


23.92 


23.93 


23.09 


23.63 


23.98 


25.92 


5 


2 


22.70 


22.71 


21.91 


22.61 


^22. 75 


24.58 


i| 


1 


21.47 


21.48 


20.74 


21.38 


21.51 


23.23 


3 


2 


20.24 


20.25 


20.00 


20.16 


20.27 


21.68 


2 


0 


20.00 


20.00 


20.00 


20.00 


20.00 


20.53 


1 


0 


20.00 


20.00 


20.00 


20.00 


20.00 


20.00 


0 


0 


20.00 


20.00 


20.00 


20.00 


20.00 


20.00 



ERIC 
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Table C2 

Linear and Equi percentile Conversions 
Regular Group . 
Section II 



RAW 


FRO 


T37 


T38 


L37 


L38 


E37 


E38 


ttt 


ttt 


ttttt 


ttttt 


ttttt 


ttttt 


ttttt 


ttttt 


10 


58 


67,35 


66.50 


68.11 


66.72 


66.60 


66.38 


39 


109 


65.89 


65.21 


66.89 


65.13 


65.09 


65.60 


38 


135 


61.13 


63.92 


65.31 


61«15 


63.61 


61.68 


37 


167 


62.97 


62.61 


63. 80 


62.87 


62.21 


63.62 


36 


235 


61.51 


61.35 


62.25 


61.58 


60.90 


62.13 


35 


276 


60.05 


60.06 


60.70 


60.30 


59.70 


60.71 


31 


326 


58.59 


58.78 


59.16 


59.02 


58.19 


59.25 


33 


362 


57.13 


57.19 


57.61 


57.71 


57.27 


57.92 


32 


151 


55.67 


56.20 


56.06 


56.15 


55.95 


56.58 


31 


138 


51.21 


51.92 


51.52 


55. 17 


51.53 


55.23 


30 


153 


52.75 


53.63 


52.97 


53.89 


52.97 


53.95 


29 


501 


51.29 


52.31 


51.12 


52.61 


51.38 


52.61 


28 


513 


19.83 


51.06 


19.87 


51.32 


19.75 


51.29 


27 


551 


18.37 


19.77 


18.33 


50.01 


18.22 


19.81 


26 


512 


16.91 


18.18 


'iS.78 


18.76 


16.71 


18.36 


2\ 


508 


15.11 


17.20 


15.23 


17.17 


15.07 


16.85 


21 


195 


13.98 


15.91 


13.69 


16.19 


13.52 


15.73 


23 


168 


12.52 


11.62 


12.11 


11.91 


12.12 


11.62 


22 


122 


11.06 


13.31 


10.59 


13.63 


10.68 


13.15 


21 


121 


39.60 


■ 12.05 


39.05 


12.31 


39.<i2 


12.27 


20 


311 


38.11 


10.76 


37.50 


11.06 


37.70 


10.88 


19 


302 


36.68 


39.18 


35.95 


39.78 


36.22 


39.65 


18 


290 


35.22 


38.19 


31.10 


38.50 


31.99 


38.65 


17 


231 


33.76 


36.91 


32.86 


37.21 


33.65 


37.61 


16 


211 


32.30 


35.62 


31.31 


35.93 


32.09 


36.19 


15 


158 


30.81 


31.33 


29.76 


31.65 


30.65 


35.38 


11 


122 


29.38 


33.05 


28.22 


33.37 


29.12 


31.29 


13 


115 


27.92 


31.76 


26.67 


32.08 


27.95 


33.19 


12 


75 


26.16 


30.17 


25.12 


30.80 


26.20 


32.03 


11 


67 


25.00 


29.19 


23.58 


29.52 


23.23 


30.72 


10 


52 


23.51 


27.90 


22.03 


28.23 


21.19 


29.20 


9 


35 


22.08 


26.61 


20.18 


26.95 


20.87 


27.68 


8 


19 


20.62 


25.33 


18.93 


25.67 


20.51 


26.17 


7 


15 


19.16 


21.01 


17.39 


21.39 


20.22 


25.50 


6 


8 


17.70 


22.75 


15.81 


23.10 






5 


2 


16.21 


21.17 


11.29 


21.82 






1 


1 


11.78 


20.18 


12.75 


20.51 






3 


2 


13.32 


18.89 


11.20 


19.26 






2 


0 


11.86 


17.61 


9.65 


17.97 






1 


0 


10.10 


16.32 


8.11 


16.69 






0 


0 


8.93 


15.03 


6.56 


15.11 
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Table C3 



IRT Conversions, Controlled Group 
Section 1.1 



RAW 


FRQ 


MIRT37 


HIRT38 


FB37 


FB38 


M MM 

wwm 


MM M 


M MMM M M 

WWW 

> 


M M MMM M 

w wv§ 


MMM M 

wv§ 


M M MM 

WW 


ii M 

^0 


58 


65.83 


65.83 


^ M Am 

65.83 


^ B Mm 

65.83 


39 


109 


64.71 


64,66 


£. ll Am 

64.23 


^ ll PA 

64.53 


38 


135 


63.49 


63.53 


^ M A^ 

62.95 


^ M M 99 

63.37 


37 


167 


^ M Mil 

62.24 


^ A ll A 

62.40 


61.71 


^ A M 4 

62.21 


36 


235 


o0t98 


61 .27 


^ M P M 

60*50 


£ « All 

61 .04 


35 


276 


59.71 


60.12 


59.27 


p M Am 

59.83 


3** 


32 0 


CO Jl ^ 


CO oil 

58.94 


PA All 

58.04 


PA PA 

58.58 


33 


362 


57i11 


C ft ftp 

57.75 


C £ ft A 

56.79 


p M Am 

57.30 


32 




PP Am 

55.80 


56.53 


55.53 


55.99 


31 


Jl «1 A 

438 


54.48 


55.30 


P ll MA 

54.28 


p ll £ff 

54.67 


30 


453 


C<9 4 ll 

53. 14 


54.04 


C <9 AC 

53*05 


C A <9P 

53.35 


29 


504 


51.80 


52.77 


^ « All 

51.84 


^ A All 

52.04 


28 


513 , 


50.45 


51.49 


50.66 


50.75 


27 


551 


49.09 


50.20 


49.51 


49.50 




512 


47.73 


48.89 


.48.39 


48.29 


25 


508 


46,37 


47.58 


47.31 


47.11 


2H 


495 


45.00 


46.27 


46.25 


45.97 


23 


468 


43.64 


44.95 


45.21 


44.86 


22 


422 


ll A M A 

42.28 


43.63 


44. 18 


ll M 9999 

43.77 


21 


424 


40.93 


42.31 


43. 16 


42.71 


20 


344 


39.52 


40.99 


42. 15 


41.65 


19 


302 


38.25 


MM ^ M 

39.68 


41.13 


ll M ^ M 

40.60 


18 


290 


36.92 


38.36 


40.09 


MM M ^ 

39.56 


17 


234 


35.61 


37.05 


39.03 


38.51 


16 


211 


34.31 


35.74 


37.94 


37.46 


15 


158 


33.04 


34.43 


36.81 


M ^ MM 

36.39 


14 


122 


31.80 


33.13 


35.62 


35.31 


13 


115 


30.61 


31.84 


34.36 


34.20 


12 


75 


29.47 


30.56 


33.02 


33.08 


11 


67 


28.40 


29.31 


31.59 


31.93 


10 


52 


27.43 


28.10 


30.08 


30.76 


9 


35 


26.57 


26.95 


28.51 


29.57 


8 


19 


25.90 


25.90 


26.87 


28.38 


7 


15 


24.69 


24.69 


25.44 


27.20 


6 


8 


23.49 


23.49 


24.19 


25.92 


5 


2 


22.29 


22.29 


22.94 


24.58 




1 


21.08 


21.08 


21.69 


23.25 


3 


2 


20.00 


20.00 


20.44 


21.91 


2 


0 


20.00 


20.00 


20.00 


20.58 


1 


0 


20.00 


20.00 


20.00 


20.00 


0 


0 


20.00 


20.00 


20.00 


20.00 
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Table CH 



Linear and Equi percent He Conversions 
Controlled Groups 
Section 12 



RAW 


FRQ 


T37 


T38 


L37 


,L38 


E37 


E36 


ill 


ill 


iiiii 


IIIII 


IIIII 


liiii 


iiiii 


iiiii 


40 


58 


65.11 


65.98 


65.86 


66.74 


64.80 




39 


109 


63.85 


64.75 


64.52 


65.46 


63.09 


66.23 


38 


135 


62.59 


63.52 


63.18 


64.17 


61.80 


65.31 


37 


167 


61.33 


62.30 


61.84 


62.89 


60.77 


63.61 


36 


235 


60.07 


61.07 


60.50 


61.60 


59.93 


61.90 


35 


276 


58.81 


59.84 


59.16 


60.32 


59.10 


60.18 


31 


326 


57.55 


56.61 


57.82 


59.03 


57.81 


59.00 


33 


362 


56.29 


57.38 


56.48 


57.74 


56.59 


57.98 


32 


454 


55.03 


56.16 


55.-14 


56.46 


55.59 


56.91 


31 


438 


53.77 


54.93 


53.80 


55.17 


54.46 


55.81 


30 


453 


52.51 


53.70 


52.46 


53.89 


53.05 


54.53 


29 


504 


51.25 


52.47 


51.12 


52.60 


51.56 


53.14 


28 


513 


49.99 


51.24 


49.78 


51.32 


50.14 


51.59 


27 


551 


48.73 


50.01 


48.44 


50.03 


49.01 


49.90 


26 


512 


47.47 


48.79 


47.10 


48.74 


47.84 


48.19 


25 


508 


46.21 


47.56 


/45.76 


47.46 


46.57 


46.56 


21 


495 


44.95 


46.33 


' 44.42 


46.17 


45.23 


45.21 


23 


468 


43.69 


45.10 


43.08 


44.89 


43.86 


44.06 


22 


422 


42.43 


43.87 


41.74 


43.60 


42.51 


43.07 


21 


424 


41.17 


42.65 


40.40 


42.32 


41.41 


42.00 


20 


344 


39.91 


41.42 


39.06 


41.03 


40.51 


40.79 


19 


302 


38.65 


40.19 


37.72 


39.75 


39.61 


39.68 


18 


290 


37.39 


38.96 


36.38 


38.46 


38.59 


38.67 


17 


234 


36.13 


37.73 


35.0i( 


37.17 


37.58 


37.61 


16 


211 


34.87 


36.51 


33.70 


35.89 


36.40 


36.49 


15 


158 


33.61 


35.28 


32.36 


34.60 


35.14 


35.44 


14 


122 


32.35 


34.05 


31.02 


33.32 


33.89 


34.56 


13 


115 


31.09 


32.82 


29.68 


32.03 


32.30 


33.68 


12 


75 


29.83 


31.59 


28.34 


30.75 


30.13 


32.73 


11 


67 


28.57- 


30.37 


27.00 


29.46 


28.03 


31.77 


10 


52 


27.31 


29.14 


25.66 


28.18 


27.76 


30.52 


9 


35 


26.05 


27.91 


24.32 


26.89 


27.50 


29.14 


8 


19 


24.79 


26.68 


22.98 


25.60 


27.24 


26.89 


7 


15 


23.53 


25.45 


21.64 


24.32 


26.51 


25.67 


6 


8 


22.26 


24.22 


20.30 


23.03 






5 


2 


21.00 


23.00 


18.96 


21.75 






4 


1 


19.74 


21.77 


17.62 


20.46 






3 


2 


18.48 


20.51 


16.28 


19.18 






2 


0 


17.22 


19.31 


11.91 


17.89 






1 


0 


15.96 


18.08 


13.60 


16.60 






0 


0 


14.70 


16.86 


12.26 


15.32 
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Table C5 

IRT Conversions, Regular Group 
Section ZZZ 



RAW 


FRO 


11TDT07 


II TP T^Q 


0D07 


OD OD 

3P3o 


rB37 


rB3o 




••• 








60 


w 




AA 

wo • 90 


Aft CA 

00 a 90 


Afl CA 
00a90 


CO CC 

OB a 90 


AO CC 
00a90 


^7 


fc9 




A7 c;ii 

Of a9H 


Aft no 

OOa 


A7 7A 
Of . f 0 


C7 OA 
Of a 30 


C7 nn 
07a90 


7v 




Of • I7 


Ac JiO 
00 a Hy 


C»? lie ' 
07a*»9 


CC WC 

00.76 


CC A A 

66.22 


£^ AA 

66.93 


l%7 

0 I . 


93 


AC un 


ce HJi 
09. MM 


CC A4 
00.91 


CC A 

69 a 70 


Ce 

65.25 


Ce 0 A 

65.89 


y w 


90 


Ac C7 


CJl on 
DM. 39 


CC OW 

06 a 37 


Cli 

6i|.62 


^ ll A A 

64.33 


^ ll All 

64.84 




ro 




Co oe 

63.35 


ce o« 

09aOl 


C A 0 A 

63.53 


£ A ll A 

63.42 


^ A fV A 

63.79 


CJl 


1 no 


Co 

03 a 81 


C^ 9^ 

62.33 


69 a 20 


£ A ll^ 

62.46 


^ A P « 

62.51 


^ M MM 

62.73 


93 


iOH 


C <^ All 

02a 9^ 


61. 32 


£ ll p p 

64.55 


61.40 


61.61 


61.68 




icji 
19** 


C<^ AC 
02e09 


60.33 


£ A Af» 

63.65 


60.37 


60.71 


60.64 


51 


158 


61,15 


59.36 


63.09 


59.36 


59.83 


59.62 


50 


164 


60.25 < 


58.40 


62.27 


58.36 


56.95 


58.62 




192 


59i34 


57.45 


61.40 


57.39 


58.09 


57.63 


MB 


tf^^^ ll 

22 H 


58.43 


56.51 


6C.49 


56.43 


57.23 


56.66 


ll M 

^7 


240 


57.52 


55.58 


59.53 


55.47 


56.38 


55.71 




251 


56.61 


54.66 


58.53 


54.53 


55.53 


54.77 


45 


259 


95.70 


53.75 


57.51 


53.56 


54.68 


53.84 


ll ll 
MM 


M A tf^ 

280 


^ ll MM 

54.78 


^ «k All 

52.84 


56.45 


52.64 


53.84 


52.92 


ll <^ 


291 


53.87 


51.94 


55.37 


51.72 


53.00 


52.01 


ll ^ 


287 


^ Mil 

52.94 


^ m All 

51 .04 


P ll MM 

54.27 


50.81 


52. 16 


51. 11 


MI 


278 


52.02 


50.15 


53.15 


49.91 


51.31 


50.22 


Jl A 


one 

296 


9la09 . 


Ik A 

«t^.26 


Pa a a 

52.03 


49.02 


50.47 


49.34 


39 


307 


50.16 


llVf "^A 

48.38 


PA A A 

50.89 


ll A Am 

48.15 


ll A ^ M 

^9.63 


ll A ll M 

48.47 


30 


ono 
302 


Jl n 09 

M9i23 


Jl W 

*«7a50 


ll A 

'«9»76 


ll A A A 

47.30 


ll A 9V A 

' 48.79 


ll A ^ « 

M7a6l 


3f 


300 


ll 0 on 

'40a29 


llC c^ 

•♦0.03 


ll D C^ 
<«0aD3 


ll C ll9V 


^7a96 


ll C 

H6.77 


30 


312 


117 OC 
*• f a 30 


lie 7C 


ll W 

•«7a92 


lie Ce 
'♦5 a 69 


ll 9f 4 A 

•«7a 12 


lie fs^ 

45.93 


39 


32m 


uc Jio 


ll ll On 

•♦•♦a09 


ll C ll 


ll ll Off 

44,85 


ll C <^ A 

H6.29 


ll ff 4 4 

HSal 1 




CO 1 


iic; iiQ 

■•9a 


ilii no 


lie on 

'♦Sa 3'* 




ll e ll C 


liii on 








ii^ 1 A 

■•3a 1 0 


ilil 97 

■f ■# . C f 




liki Ao 


ilO HA 


32 


319 


43.63 


42.33 


43.23 


42.52 


43. 80 


42.67 


31 


289 


42.71 


41.49 


42.20 


41.75 


42.97 


41.87 


30 


262 


41.79 


40.66 


41.19 


40.99 


42.13 


41.07 


29 


308 


40.68 ' 


39.84 


40.18 


40,23 


41.28 


40.27 


28 


280 


39.98 


39.02 


39.20 


39.47 


40.43 


39.47 


27 


266 


39.08 


38.22 


38.22 


38.70 


39.57 


38.66 


26 


226 


38.20 


37.43 


37.26 


37.94 


38.70 


37.85 


25 


226 


37.33 


36.64 


36.32 


37.16 


37.82 


37.03 


24 


203 


36.47 


35.87 


35.41 


36.39 


36.95 


36.22 


23 


216 


35.63 


35.12 


34.54 


35.63 


36.07 


35.41 


22 


212 


34.80 


34.38 


33.72 


34.87 


35.20 


34.60 


21 


178 


34.00 


33.65 


32.96 


34.12 


34.35 


33.81 


20 


155 


33.22 


32.95 


32.26 


33.40 


33.51 


33.05 



ERJ.C 
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Table C5 cont'd 



RAW 


FRQ 


MIRT37 


MIRT38 


3P37 


3P38 


FB37 


FB38 


••• 


■•■ 














19 


139 


32.46 


32.26 


31.62 


32.71 


32.69 


32.31 


18 


123 


31.74 


31.60 


31.04 


32.07 


31.88 


31.63 


17 


99 


31.05 


30.96 


30.52 


31.46 


31.10 


30.99 


16 


73 


30.40 


30i36 


30.04 


30.90 


30.34 


30.40 


15 


67 


29.80 


29.79 


29.62 


30.38 


29.61 


29.87 


lit 


51 


29.25 


29*27, 


29.24 


29.9 


28.91 


29.39 


13 


42 


28.77 


28.79 


26.90 


29.45 


28.23 


28.95 


12 


17 


28.36 


28.37 


28.41 


28.99 


27.42 


28.53 


11 


11 


27.56 


27.58 


27.57 


28.43 


26.60 


28.13 


10 


6 


26.72 


26.74 


26.73 


27.56 


25.79 


27.45 


9 


7 


25.89 


25.91 


25.89 


26.68 


24.97 


26.57 


8 


1 


25.06 


25.08 


25.05 


25.81 


24.15 


25.69 


7 


1 


24.23 


24.24 


24.21 


24.94 


23.33 


24.81 


6 


1 


23.40 


23.41 


23.37 


24.06 


22.51 


23.93 


5 


2 


22.56 


22.58 


22.53 


23.19 


21.69 


23.05 


4 


2 


21.73 


21.74 


21.68 


22.31 


20.87 


22.17 


3 


1 


20.90 


20.91 


20.84 


21.44 


20.05 


21.29 


2 


1 


20.07 


20.08 


20,00 


20.56 


20.00 


20.41 


1 


1 


20.00 


20.00 


20.00 


20.00 


20.00 


20.00 


0 


0 


20.00 


20.00 


20.00 


20.00 


20.00 


20.00 
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Linear and Equl percentile Conver>Sfons 
Regular Group 
Section ZII 

Raw FRQ T37 T38 L37 L38'^ E37 E38 

••• •••• •••••• •••••• •••••• •••••• •••••• 

66.62 - - 

65.76 - 67.27 

6^4.90 66.21 66.27. 

6H,0H 65. ^45 66.06 

63.19 6J4.70 65.00 

62.33 63.91 63.63 

61. n 63.16 62.59 

60.61 62.^46 61.59 

59.75 61.73 60.61 

58.89 60.61 59.37 

58.03 59.58 58.23 

57.17 58.81 57.31 

56.31 58.06 56.^45 

55.^45 57.32 55.60 

5^4.59 56.5^4 5^4.66 

53.73 55.68 53.7^ 

52.87 5^4.82 52.86 

52.01 5^4.00 51.97 

51.16 53.39 51.07 

50.30 52.77 50.13 

'49.'4'4 52.13 ^49. 17 

48.58 51.^48 ^48. 19 

^47. 72 50.58 ^47.27 

146.86 '4^.60 146. ^40 

J46.00 ^48. 60 ^45. 66 

^45.1^4 i47.78 ^414. 95 

i4i4.28 ^47.0^4 ^414. 33 

^43.^42 ^46. 13 ^43. 74 

42.56 45.17 43.10 

41.70 44.16 42.38 

40.84 43.31 41.58 

39.98 42.48 40.59 

39.13 41.73 39.59 

38.27 40.98 38.61 

37.41 40.20 37.73 

36.55 39.15 36.90 

35.69 37.98 36.20 

34.83 36.84 35.46 

33.97 35.87 34.68 



60 


8 


70.97 


67.37 


71.77 

1 ••11 


59 


25 


69.97 


66.48 


70.76 


°58 


34 


68.98 


65.58 


69.75 


ii 


53 


67.99 


64.69 


68.74 


'36 


56 


67.00 


63.80 


67.74 


55 


78 


66.01 


62.91 


66.73 


54 


102 


65.01 


62.02 


65.72 

v^ e 1 b 


53 


104 


64.02 


61 . 12 

wit lb 


6ii.71 


52 


154 


63.05 


60.23 


63.70 

w ^ e 1 w 


51 


158 


62«01 


59.34 


62.69 






61 0^ 


3 o • ^3 


61 6R 






60 06 


57 55 
7 r • ?w 


60 67 




22^4 

b b ~ 




56.66 


v59.66 


47 




58.07 


55.77 


58.66 


46 


251 


57.08 


54.88 


57.65 
^ 1 . v^ 


45 


259 

^ ^ 


56.09 


53.99 


56.64 


44 


280 


55. 10 


53.09 


55.63 


43 


291 


54. 11 


52.20 


54.62 


42 


287 


53.11 


51.31 


53.61 


41 


278 


52. 12 


50.12 


52.60 

. WW 


40 


296 


51.13 


49.53 


51.59 


39 


307 


50.14 


48.63 


50.58 


38 


302 


49.15 


47.74 


49.58 


37 


306 


48.16 


46.85 


48.57 


36 


312 


47.16 


45.96 


47.56 


35 


324 


46.17 


45.07 


46.55 


34 


281 


45.18 


44.17 


45.54 


33 


311 


44.19 


43.28 


44.53 


32 


319 


43.20 


42.39 


43.52 


31 


289 


42.21 


41.50 


42.51 


30 


262 


41.21 


40.60 


41.50 


29 


308 


40.22 


39.71 


40.50 


28 


280 


39.23 


38.82 


39.49 


27 


266 


38.24 


37.93 


38.48 


26 


226 


37.25 


37.04 


37.47 


25 


226 


36.26 


36.14 


36.46 


24 


203 


35.26 


35.25 


35.45 


23 


216 


34.27 


34.36 


34.44 


22 


212 


33.28 


33.47 


33.43 



ERIC 
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R^w FRQ T37 T38 137 L3B E37 E38 



21 


178 


32.29 


32.58 


32.42 


33.11 


31.94 


33.78 


• 20 


155 


31.30 


31.68 


31.42 


32.25 


34.17 


32.96 


19 


139 


30.31 


30.79 


30.41 


31.39 


33.**2 


3; .32 


18 


123 


29.31 


29.90 


29.10 


30.53 


32.87 


31.65 


17 


99 


28.32 


29.01 


28.39 


29.67 


32.33 


30.89 


16 


73 


27.; 33 


28.12 


27.38 


28.81 


31.52 


29.88 


15 


67 


26.34 


27.22 


26.37 


27.95 


29.82 


25.52 


14 


51 


25.35 


26.33 


25.36 


27.09 


27.92 




13 


42 


24.36 


25.44 


24.35 


26.24 


27.04 




12 


17 


23.36 


24.55 


23.34 


25.38 


26.72 




11 


11 


22.37 


23.66 


22.34 


26.52 


26.41 




10 


6- 


21.38 


22.76 


21.33 


23.66 


26.10 




9 


7 


20.39 


21.87 


20.32 


22.80 






8 


1 


19.10 


20.98 


19.31 


21.94 






7 


1 


18.41 


20.09 


18.30 


21.08 






6 


1 


17.41 


19.19 


17.29 


20.22 






5 


2 


16.112 


18.30 


16.28 


19.36 








2 


15.13 


17.41 


15.27 


18.50 






3 


1 


14.114 


16.52 


14.26 


17.64 






2 


1 


13.15 


15.63 


13.26 


16.78 






1 


1 


12.15 


11.73 


12.25 


15.92 






0 


0 


11.16 


13.84 


11.24 


15.06 
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Table C7 ' 

. Xrt Conversions, Control IM Group 
Section ZII^ 

/ 
/ 

/ 



Raw 


FRQ 


MIRT37 


MIRT38 


IRT37 


IRT38/ 


M» 








aiiiii 




60 


8 


68.56 


68.56 


68.55 


68.55 


59 


25 


68.17 


67.27 


67.36 


67.51 


58 


34 


67.68 


66. 12 


66.21 


66.50 


57 


53 


67.12 


65.03 


65.20 


65.52 


56 


56 


66.51 


63.98 


64.24 


64.55 


55 


78 


65.86 


62.98 


63.32 


63.59 


5'( 


102 


65. 17 


61 .99 


62.42 


62.64 


53 


104 


64.44 


61 .01 


61.51 


61.70 


52 


154 


63.70 


60. OQ 


60.65 


60.76 


51 


158 


62.92 


5Q.16 


5Q 7Q 


5Q BH 


50 


I6i| 




^R.2il 


5R Q5 


5i) QU 


49 


192 


61 .31 


57.12 


58.11 


58.011 


48 


224 


60 . 50 


56.42 


57. 2Q 


57. 16 




240 


5Q^66 




56 117 


56 2ft 


46 


251 


58.81 


54.61 


55.65 


55.110 


45 


259 


57.94 


53.72 


54.84 


54.53 


44 


280 


57.05 


52.82 


54.03 


53.66 


43 


291 


56.16 


51.93 


53.22 


52.78 


42 


287 


55.25 


51.04 


52.40 


51.90 


41 


278 


54.32 


50.15 


51.58 


51.02 


40 


296 


53.38 


49.26 


50.76 


50.14 


39 


307 


52.43 


48.37 


49.94 


49.25 


38 


302 


51.46 


47.49 


49.11 


48.37 


37 


306 


50.48 


^46.60 


48.29 


47.49 


36 


312 


49.49 


45.72 


47.46 


46.60 


35 


324 


48.49 


44.85 


46.63 


45.72 


34 


281 


47.48 


43.98 


45.80 


4i|.84 


33 


311 


46.46 


43.11 


44.97 


43.96 


32 


319 


45.4i| 


42.26 


44.14 


43. 08 


31 


289 


44.42 


41.41 


43.30 


42.20 


30 


262 


43.39 


40.57 


42.46 


41.32 


29 


308 


42.37 


39.73 


41.62 


40.44 


28 


280 


41.35 


38.91 


40.77 


39.55 


27 


266 


40.34 


38.11 


39.91 


38.66 


26 


226 


39.35 


37.71 


39.05 


37.78 


25 


226 


38.36 


36.53 


38.19 


36.90 


24 


203 


37.39 


35.77 


37.33 


36.04 


23 


216 


36.44 


35.03 


36.47 


35.20 



67 



\ 

\ 



60 
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Table C7 cont'd 



Raw 


FRO 


M1RT37 


MIRT36 


IRT37 


IfiT38 


§!• 


§■• 


•••••• 


■•nil 


•••••§ 


§••§•• 


22 


212 


35.51 




35.62 


314.38 


21 


178 


3'«.61 


33.60 


3'4.78 


33.60 


20 


155 


33.73 


32.91 


33.96 


32.87 


1Q 




32.89 


32.25 


33.16 


32.19 


16 


123 


32.08 


31 .61 


32.38 

• VW 


31.55 


17 


99 


31 .31 


3 1 . 00 


31.614 


30.98 


16 


73 


30.59 


30.^42 


30.93 


30.145 


1^ 


67 


29.92 


29.86 


30.27 


29.97 




51 


29.31 


29. 


29.65 


29.53 


13 


il2 


28.78 


28.814 


'29.09 


29.12 


12 


17 


28.36 


28.37 


28.58 


28.73 


11 


11 


27.53 


27.55 


27.81 


28.20 


10 


6 


26.70 


26.72 


26.96 


27.314 


9 


7 


25.J7- 


25.89 


26.11 


26.149 


8 


1 


25.0i« 


25.05 


25^^.25 


25.63 


7 


1 


2l«.20 


2H,22 


24.140 


214.76 


6 


1 


23.37 


23.39 


23.55 


23.92 


5 


2 


22.5l« 


22.56 


22.70 


23.07 


H 


2 


21.71 


21.73 


21.65 


22.21 


3 


1 


20.66 


20.89 


21.00 


21.36 


2 


1 


20.05 


20.06 


20.15 


20.50 


1 


1 


.20.00 


.^20.0fl. 


20.00 


20.00 


0 


0 


20.00 


20.00 


20.00 


20.00 



I 
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Table C8 



Linear and E^ul percentile Conversions 
Controlled Group 
Section III 





FRO 


T^7 


* ^o 




UOQ 


Ci3 r 


C30 


••• 


••• 


fleeee 


••••• 


••••• 




• •••• 


••••• 




a 
o 


7P Pi 

1 CaC 1 


67 in 

w r * iv 


7il 91 


' 66 
00 *93 








C7 


71 PI 
(■•CI 


6ft 911 

VU * 


79 19 


6c; 67 
09. 0 f 










70 Pi 


v7* ^0 


SO nil 


61 Ro 
0 *t * Oc 


97 
•99 *c 1 




57 




• 6Q.P1 


61 .5^ 

W^l * 


7n Qfi 

f u* 


69 07 
03* 


6>; 71 

op* f"* 








00 eC 1 


69 67 


60 07 
0>*0f 


69 1 9 

03* 1c 


* 6^ 9n 

. 09*cU 






r 0 


Ore cc 


69 A1 
Oc * 0 1 


60 70 
00* f9 


69 96 
0£ * cO 


611 91 
vH * CI 


69 lin 
03 . 




1 no 


OOeCC 


' 61 OR 
0 1 * y 9 


67 71 

0 f * f 1 . 


61 Hi 
0 1 * HI 


69 9Ct 


61 c;ii 

0 1 *9** 


53 


i nJi 


05 9 22 


01 * 10 


0^0^ Ao 
00* 02 


An CA 
00*90 


Ao A4 
02* 01 


An 04 
00* 31 


52 


i CJl 


< Jl 

01*22 


00*21 


I^C CJl 

09*51 


CO f n 
59.70 


A 4 0*7 

01*97 


CO 1 0 
99.18 


51 


i CO 

150 


iC9 OO 

03*22 


en 9Q 
59.30 


l%Ji Jii^ 
01* 10 


CO ' Oc 

50. 05 


An on 

00* 90 


CO 90 
98. 39 


50 


1 Aji 


<9 oo 
02e2c 


CO CO 

50.52 


03.37 


c 0 nn 
50.00 


CO 

59.78 


' C7 09 

97.82 


Jl A 


192 


01 .22 


57.07 


C A A A 
02*29 


CM 4 Jl 

5* . 11 


CA 4 0 
59. 13 


cv oc 
57.25 


Jl b 


Jl 


00*22 


50*81 


1^ 4 A 4 

01*21 


c A on 
50*29 


CO Jl 0 

58.18 


CA AO 
50*03 


If 


> 9 Jin 


CO oo 

59a23 


cc nc 
55*95 


l%n 4 0 
00* 12 


cc llll 

55.11 


C*7 *?C 
97.75 


CC 00 
99.99 


•40 




C Q 09 

50*23 


cc nn 
55.09 


CO nil 
59.01 


Cll CO 

51.59 


C7 nn 
9 f .00 


cc on 
99. cU 


lie 
«f5 


259 


57e23 


CJl OJl 

91* 2*1 


C7 OA 
97. 90 


C? •TO 
53.73^ 


CA Ai 
90* Ul 


Cll 111 

9'** I** 


Jl Jl 


1 0 f% 

2o0 


50*23 


53.38 


C £ Off 

5o*o7 


C A OB 

52*00 


c-c n A 

55.00 


CO 00 
53.23 


Jl ^ 


291 


55*23 


CO CO 

52*52 


cc *7A 

55* 79 


c 0 no 
52*03 


CJl 4 0 
51. 19 


CO 00 
52*39 


Jl 


287 


e Jl 19 

5*1.23 


51 .00 


C Jl f 4 
51*71 


£4 4 *f 
51.17 


CO AA 
53.00 


C 4 All 
91.01 




278 


53.23 


50.61 


53*62 


50.32 


PA 4 A 

53. 13 


PA A 

50.79 


10 




52.23 


ll A AP 

49.95 


52*51 


ll A Jl f 
19.17 


e A ll f 

52.17 


Jl A JlC 
19.15 


39 


307 


i!r 4 '^Ji 

51*2*1 ^ 


Jl A A A 
19.09 


C 4 JiA 

51 . 10 


JlO A 4 

18*01 


C 4 Ao . 

9 1 . 02 


JlO 1 1 

18* 11 


38 


302 


A Ui 

50*21 


Jl 0 

18. 23 


e A At 

50.37 


Jl *f 

17.70 


en c 4 
50*51 


Jl f 4 n 
17*10 


37 


30o 


ll A 111 


Jl iv <i 0 

47.38 


Jl A A A 

49,29 


hC A 4 
10*91 


Jl A 0*V 
19.37 


Jl A OA 

10* 20 


36 


3i2 


Jl 0 OJl 

1o*21 


Jl £ CO 

10*52 


Jl 0 04 
18*21 


Jl A nA 
10*00 


llO 00 
18*32 


lie Ac 
*19*09 


35 


32*1 


Jl*? OJl 
17e 21 


lie Sin 

15* 00 


Ji*y 4 0 
17. 12 


lie on 
M9. 20 


ll*? CO 
**7.92y 


lie nil 

H9* U*l 




cOI 


911 

* *fD*C*l 


nil On 
•♦•i*ou 


HA nil 
H 0* U*l 


HH 9C 


Jl A 7A 
•to* r 0 ^ 


1U 10 


33 


311 


15*21 


^13.95 


11*96 


13.i>f? 


46.07 


43.97 


32 


319 


11*21 


13.09 


13.87 


12*61 


45.26 


43.43 


31 


289 


^13. 25 


12*23 


12*79 


11.79 


44.34 


42.66 


30 


262 


12*25 


^11*37 


11*71 


10*91 


43.44 


41.86 


29 


308 


H*25 


10*52 


10*62 


10*08 


42.53 


40.91 


28 


280 


il0*25 


39.66 


39. 5U 


39.23 


41.78 


39.95 


27 


266 


39.25 


36*80 


38*16 


38*38 


41.08 


38.95 


26 


226 


38*25 


37.9^1 


37-38 


37*53 


40.20 


38.00 


25 


226 


37*25 


37.09 


36*29 


36*67 


39.05 


37.08 


24 


203 


36*26 


36*23 


35*21 


35*82 


37.73 " 


36.33 


23 


216 


35*26 


35.37 


31.13 


31.97 


36.46 


35.63 


22 


212 


3'«e26 


3*1.51 


33.01 


31.11 


35.63 


35.01 


21 


178 


33.26 


33.66 


31.96 


33.26 


34.93 


34.22 


20 


155 


32*2fi 


32*80 


30*88 


32*11 


34.20 


33.31 


19 


139 


31.26 


31.91 


29*79 


31.55 


33.*<1 


32.92 
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Table 


C8 cont*d 




RAW 


FRQ 


T37 


T38 


L37 ' 


L38 


••• 


•II 


mil 


mil 


mil 


mil 


18 


123 


•> 

30.26 


31.08 


28.71 


30. 70 


17 


99 


29.26 


30.23 


27.63 


29.85 


16 


73 


28. 27 


29.37 


26. 511 


29.00 


15 


67 


27.27 


28.51 


25.116 


'28.14 


14 


51 


26. 27 


27. 65 


24.38 


27. 29 


13 


^2 


25.27 


' 26.80 


23.29 


26.44 


12 


17 


211.27 


25. 911 


22. 21 


25. 58 


11c 


11 


23.27 


25.08 


21.13 


24.73 


' 10 


% 


22.27 


" 2K.22 


20. 04 


23.68 


9 


7 


21.27 


23.37 


18.96 


23.02 


6 


1 


20. 28 


22.51 


17. 88 


22. 17 


7 


1 


19.28 


21.65 


16*79 


21.32 


6 


1 


18.28 


20.79 


15.71 


20. 47 


5 


2 


17.28 


19.911 


14.63 


19.61 


H 


2 


16. 28 


19. 08 


13. 54 


18. 76 


3 


1 


15.26 


18.22 


12.46 


.17.91 ' 


2 


1 


111.28 


1.7. 36 


11.38 


17. 05 


1 


1 


13.29 


16.51 


10.29 


16.20 


0 


0 


12.29 


15. 65 


9.21 


15. 35 



4 



70 



9 



E37 


E38 


mil 


mil 


32. 60 


32.53 


31.78 


32.14 


30. 08 


30.17 


27.52 


25.25 


26. 14 




25.83 




25. 52 




25.21 




24.90 





\ 



Table C9 

Sealed Score Means and Standard Deviations 
Modified IRT, Fixed b's, 3-parameter Re-estimated, 









Tucker 


and 


Levine 


Equatings 








t 








Regular Group 




Controlled Group 








II 


III ' 




II 




III 




N 


Mn. 


S.D. 


Mn. 


S.D. 


N 


Mn. 


S.D. 


Mn . 


S.D. 


Modified TRT 

3ETF6(37) 
(36) 


339 
329 


1*9 
'*9 


8.1 
8.3 


50 
18 


8.2 
7.1 


311 
308 


17 
17 


9.3 
9.2 


19 
16 


9.1 
8.0 

ft 

7.8 
7.5 


Fixed b's 
3ETF6(37) 
(38) 


1011 
988 


'*9 
18 


7.6 
7.8 


19 
17 


7.3 
7.1 


1790 
1790 


17 
17 


7.9 
7.9 


17 
16 


3-parameters 
3ETF6(37) 
(38) 


1018 
988 


18 
18 


9.9 
8.3 


50 
17 


8.9 
7.1 












Tucker 
3ETF6(37) 
(38) 


1005 
980 


18 
18 


9.2 
8.1 


19 
17 


8.6 
7.8 


315 
305 


17 
17 


8.9 
8.7 


19 
16 


9.5 
8.0 


Levine 
3ETF6(37) 
(38) 


1005 
980 


17 
19 


9.8 
8.3 


50 
17 


8.7 
7.5 


315 
305 


16 
17 


9.1 
9.1 


19 
16 


10.3 
7.9 
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